Blog #7: What are Containers?

What are Containers?

Containers are a method of operating system virtualization that allow you to run an application and its dependencies in resource-isolated processes. Containers allow you to easily package an application’s code, configurations, and dependencies into easy to use building blocks that deliver environmental consistency, operational efficiency, developer productivity, and version control. Containers can help ensure that applications deploy quickly, reliably, and consistently regardless of deployment environment. Containers also give you more granular control over resources giving your infrastructure improved efficiency. Running containers in the AWS Cloud allows you to build robust, scalable applications and services by leveraging the benefits of the AWS Cloud such as elasticity, availability, security, and economies of scale. You also pay for only as much resources as you use.

Containers and Virtual Machines

monolith_2-VM-vs-Containers

Benefits of Containers

 

Containers_Benefit_EnvironmentConsistency

Environment Consistency

Containers enable portability and help reduce the organizational and technical frictions of moving an application through the development, testing, and production lifecycle. Containers encapsulate all the necessary application files and software dependencies and serve as a building block that can be deployed on any compute resource regardless of software, operating system, or hardware configurations (e.g., you can run the same container on your Ubuntu laptop and on your Red Hat Enterprise Linux production servers). Whatever you package as a container locally will deploy and run the same way whether in testing or production. This is beneficial for you and your organization because you can deploy an application reliably and consistently regardless of environment. This helps you avoid manually configuring each server and allows you to release new features faster.

 

Containers_Benefit_OperationalEfficiency

Operational Efficiency

Containers can help you get more from your computing resources by allowing to you easily run multiple applications on the same instance. With containers, you can specify the exact amount of memory, disk space, and CPU to be used by a container on an instance. Containers have fast boot times because each container is only a process on the operating system running an application and its dependencies. This reduced footprint enables you to quickly create and terminate applications or tasks encapsulated in a container allowing you to rapidly scale applications up and down. You can use blue-green deployment patterns to rollout new application versions (e.g., using Amazon Elastic Container Service) because the entire application and all its dependencies are contained in an image. Containers also provide process isolation, which allows you to put each application and its dependencies into a separate container and run them on the same instance. There are no shared dependencies or incompatibilities because each container is isolated from the other (e.g., you can run two containers that use different library versions on the same Amazon EC2 instance).

You can also create container images that serve as the base for other images. Operations teams can create a base image composed of the operating system, configurations, and the various utilities they want. Development teams can then build their application on top of the base image. This allows you to avoid the complexities of server configuration.

 

Containers_Benefit_DeveloperProductivity

Developer Productivity

Containers increase developer productivity by removing cross-service dependencies and conflicts. Each application component can be broken into different containers running a different microservice. Containers are isolated from one another, so you don’t have to worry about libraries or dependencies being in sync for each service. Developers can independently upgrade each service because there are no library conflicts.

 

Containers_Benefit_VersionControl

Version Control

Containers allow you to track versions of your application code and their dependencies. Docker container images have a manifest file (Dockerfile) that allows you to easily maintain and track versions of a container, inspect differences between versions, and roll-back to previous versions.

 

Cloud is about how you do computing, not where you do computing.

Blog #6 : Serverless Architecture

“No server is easier to manage than no server.”  — Werner Vogels, Amazon CTO

A contradiction? The concept might be causing you come confusion – “IT Infrastructure without servers”?

As is so often the case the answer is “not exactly”!

The term “serverless” in this context doesn’t strictly mean without servers. It means you no longer have to think about them like you once did.

Traditionally the code that makes up applications would be run on physical servers. Until recently a Cloud based architecture would mirror this making use of virtualised servers in place of the physical ones. Several Cloud providers have now started providing services which allow you to upload your code into the Cloud where it can be executed based on events or schedules you specify without having to provision any servers at all. Instead the Cloud provider creates the execution environment for the code as needed.

This means that you no longer have to worry about scaling, capacity and updates. Deploying code also becomes simpler since you there is no need to create virtual machine images anymore. Perhaps best of all though is the fact that you don’t need to have servers running all the time costing you money. You are only charged while the code is executing and many tasks can complete in seconds, or even milliseconds. This means that moving from a traditional server based architecture to a serverless architecture has the potential to generate huge savings.

It is probably fair to say that the front-runner in providing serverless tools is Amazon Web Services (AWS) <https://aws.amazon.com/>. The main service that AWS provide for facilitating serverless architectures is AWS Lambda <https://aws.amazon.com/lambda/>.

As described above this allows code to be uploaded into your AWS account. The code can then be executed in response to events generated from other AWS services. For example, you could configure code to run in response to image files being uploaded to an S3 <https://aws.amazon.com/s3/> bucket. This code could produce a watermarked version of the image and put it in another bucket for use by another piece of code. This processing can all be done without any AWS EC2 <https://aws.amazon.com/ec2/> instances being created at all. AWS Lambda code can also be run based upon a specified schedule allowing you to create overnight batch processing jobs for instance.

To make AWS Lambda even more useful however it can also tie in with the AWS API Gateway <https://aws.amazon.com/api-gateway/> service. In essence this allows Lambda code to respond to HTTP web requests. This means that it is possible to code serverless web based RESTful APIs. In addition, by using S3 to serve static content, entirely serverless web applications and sites can be created.

As an example we recently created an internal monitoring tool where we architected out the use of AWS EC2 instances.   Here we used a range of AWS services in combination with other IT Technologies:

Serverless

  • AWS S3  <https://aws.amazon.com/s3/>:  We stored our front end static webpage content and images in an S3 bucket. The pages use javascript to talk to a serverless backend. When clients request static content it is served by S3.
  • AWS API Gateway <https://aws.amazon.com/api-gateway/>: Using the API Gateway service we were able to create routes to Lambda functions. The Lambda functions are then invoked in response to normal GET and POST requests. In effect the API Gateway routes act like a trigger to run the Lambda functions.
  • AWS Lambda <https://aws.amazon.com/lambda/>: This is the core AWS service for deploying applications using a serverless architecture. We wrote the core components of the application using python and deployed these python components to AWS as Lambda functions.
  • AWS DynamoDB <https://aws.amazon.com/dynamodb/>:  This managed NoSQL database service is not required for a serverless deployment however we made use of it as it integrates very smoothly with other AWS services. DynamoDB has great flexibility and near infinite scalability.
  • AWS CloudWatch <https://aws.amazon.com/cloudwatch/>: To monitor the deployed AWS resources which made up the application.

Serverless architecture has to be one of the most exciting and innovative concepts to grow out of Cloud computing and has real potential to reduce the complexity, management overheads and cost of running applications.

It also undoubtedly gets you lots of kudos if you can say that your web application is running without using any servers at all!

Cloud is about how you do computing, not where you do computing.

Blog #5 : Infrastructure as code

One of the first things about Cloud computing that really fired up my imagination was the concept of infrastructure as code. The phrase instantly conjured up the merging of two worlds in a very exciting way. As I looked into it more I realised that this concept was clearly one of the cornerstones of the public  Cloud (and of DevOps as well but that is probably best left for another blog post).

This all sounds wonderful but what does it actually mean? To understand this it is easiest to consider the two parts of the term separately. In a traditional IT environment the infrastructure consists of the servers and storage that runs the applications a business uses on a daily basis as well as the networking components that plumb everything together. The code is the software that actually makes up these applications written in a programming language. The infrastructure would be physical but the code would be stored digitally and importantly could be backed up, copied and different versions could be maintained.

Virtual Infrastructure 

In the world of the cloud however the hardware infrastructure components are (from the user’s point of view anyway) virtual. If you need a new server you just log into your account and with a few clicks of your mouse you can have it up and running. The concept of infrastructure as code takes this a step further however and allows the virtual, cloud based infrastructure to be described textually in a way very reminiscent of computer code. This textual description or template can then be used to request the infrastructure.

This has many remarkable and powerful implications. It means that your infrastructure is now reproducible at the click of a button which is great for producing test and development environments that are identical to a given production environment. It removed a large degree of human error from the process. Your infrastructure is also documented in a single location and the templates can be stored, backed up and version controlled in the same way that computer code can be.

How to Achieve It 

To give a concrete example, Amazon Web Services (AWS) <https://aws.amazon.com/> has a service called Cloudformation <https://aws.amazon.com/cloudformation/> which allows you to describe your infrastructure using a language called JSON <https://en.wikipedia.org/wiki/JSON> (JavaScript Object Notation). You then upload this template and the AWS Cloudformation service works out the dependencies between the various infrastructure components specified and requests that they are created. In AWS terms, this set of infrastructure is referred to as a stack. If you need to modify the stack then you can just make changes in the template and then Cloudformation can work out what has changed and apply them for you.

Infrastructure as code really is a concept that sounds simple and innocuous at first but the more you think about the more you realise how much potential it has to transform the way you work.

Cloud is about how you do computing, not where you do computing.

Blog #4 : Compute

The world of IT infrastructure is essentially made up of storage and compute resources plumbed together with networking equipment. In this post I am going to talk about the compute element of that mix and how public Clouds have revolutionized the provisioning and use of this vital component.

So what are these “compute resources” then? What I mean by this is the things that actually do all the hard work of computing. In the world of traditional IT these would be physical computers sitting in a big room full of cables and air conditioning (often known as a data centre <https://en.wikipedia.org/wiki/Data_center>). These computers would all have their own CPU’s, memory and disks and would be running operating system software (such as Unix/Linux or Microsoft Window ). These computers would normally be referred to as “servers” (since they provide services to other computers) and one piece of physical hardware would equate to one server.

A revolution then occurred which meant that many servers could run on one piece of physical hardware. This revolution was called virtualization <https://en.wikipedia.org/wiki/Virtualization> and it allowed data centres to be transformed – with fewer physical computers,less power and air conditioning was needed and the provisioning of new servers became a lot more flexible. In the past, if a new server was required then you would have to wait until a physical computer could be purchased, delivered and then set up. With virtualised servers it became possible to provision a new server on an existing virtual host server (and easily get rid of it again once it was no longer needed). Even so there were still limits – the virtual hosts only had a finite amount of memory and disk space available.

The birth of the public Cloud 

Then the public Cloud came into existence and heralded another transformation in the world of servers. Services such as Amazon Web Services <https://aws.amazon.com/> and Microsoft Azure now allow you to create virtual servers in their data centres. These data centres are vast and have near unlimited capacity to host virtual machines. This kind of service is often known as “Infrastructure as a Service <http://www.gartner.com/it-glossary/infrastructure-as-a-service-iaas/>” and allows virtual machines to be created in minutes to your specifications. Additionally, you typically only pay for them for the period in which you actually use them. This increase in flexibility provides many benefits including the ability to have powerful servers available with no up-front costs at all – a great boon for start-up type companies, but useful for anyone commencing a new project. It also allows companies to have backup servers in case of a disaster, at very low cost.

To give an example, Amazon Web Services (AWS) provide Cloud based virtual machines via their Elastic Compute Cloud EC2 <https://aws.amazon.com/ec2/> service. The EC2 service has a simple web interface which allows new instances to be created very easily and quickly. There are a large range of different instance types including ones optimised for storage or compute performance. On AWS you pay for EC2 instances by the hour and you can get discount on the cost by reserving them for one or three years.

Virtual machines are one of the cornerstones of Cloud computing and the flexibility, pricing model and convenience that they provide are one of the main reasons why the Cloud is permanently changing the IT landscape.

Cloud is about how you do computing, not where you do computing.

Blog #3 : Elasticity

Elasticity is a fundamental concept underpinning Cloud computing that, when deployed appropriately, can allow you to create highly responsive, reliable and cost effective applications in the Cloud.

In essence elasticity allows the capacity of a given Cloud environment to increase and decrease in response to demand.

As an example of how this might help, imagine that you have a web-site that normally runs happily on a single server in the cloud but suddenly you are “slashdotted”.

What I hear you say?  https://en.wikipedia.org/wiki/Slashdot_effect

Your single web-server won’t be able to cope with the massive spike in traffic and eventually collapses in a heap. Using the elasticity of the Cloud however, extra web servers can be automatically provisioned to meet with the demand.  Yes – automatically without intervention or preparation.

After the surge of extra users dies down again, the extra web servers can then be automatically decommissioned.  Since Cloud services are typically purchased on a “pay for what you use” basis then this means that you can match the resources you pay for very closely with what you actually need, based on demand.

Elasticity is usually associated with horizontal scaling of capacity – in other words adding extra servers to an existing set of servers. It is also possible to scale vertically – that is increase the resources (CPU, memory etc.) available to a given server. Vertical scaling is harder to undo when compared to horizontal scaling however, making it less suitable for the flexible increases and decreases of capacity that elasticity attempts to address.

This is all in very stark contrast to traditional IT infrastructure where you would have to purchase enough capacity up front to cope with demand. This would mean that most of that already paid for capacity would typically go un-utilised for lengthy periods of time so that you are more than likely paying far more than you actually require.  You are also reliant on your demand estimate model being correct!

How to achieve it 

In the world of Amazon Web Services elasticity is mainly achieved using a combination of Elastic Compute Cloud virtual servers (EC2 – https://aws.amazon.com/ec2/), the AWS Auto-scaling service (https://aws.amazon.com/autoscaling/) and AWS Elastic Load Balancers (ELB – https://aws.amazon.com/elasticloadbalancing/). It is possible to define an Auto-scaling group which maintains a certain minimum number of EC2 web or application servers and monitors an appropriate metric (typically CPU usage). The Auto-scaling group can then add additional EC2 servers to the group if the metric rises above a certain level. This group of servers can then be registered with an Elastic Load Balancer which distributes requests evenly to the servers. If the CPU load drops back down below a predefined level then the additional EC2 servers can be terminated.

Elasticity is a fundamental concept underpinning Cloud computing that, when deployed appropriately, can allow you to create highly responsive, reliable and cost effective applications in the Cloud.

 

Cloud is about how you do computing, not where you do computing.

Blog #1 : The Cloud

The Cloud is a pretty nebulous (excuse the pun!) concept which is pretty difficult to sum up in a short blog post such as this but I said I  was up for a challenge, so here goes.

I’m going to restrict myself here to talking about the kind of Infrastructure-as-a-Service type of Cloud offerings (e.g. Amazon Web Services or Microsoft Azure) rather than Software-as-a-Service products such as Salesforce.

The modern public Cloud then is essentially internet based, on-demand computing. Traditionally, if you needed a new server to run some software, you would need to go through the process of ordering one, waiting for it to arrive, configuring it and then housing, and maintaining it. This would typically take weeks and involve a significant amount of up-front capital expenditure as well as on-going running costs.

With the Cloud you can instead log into a website and start up a virtual server (see https://en.wikipedia.org/wiki/Virtual_machine) although this concept will be covered in a future blog post as part of this series, in a matter of a few minutes. You would only pay for this server whilst it is being used with no up-front cost at all. Virtual servers are just the tip of the iceberg however, with Cloud providers offering a massive variety of services including file storage, databases and virtual networking.

This means that the Cloud has transformed IT infrastructure from an expensive burden into a utility that can be consumed like electricity or gas. In the early days of the Cloud (Amazon launched its Cloud services in 2006) this kind of service was perhaps most useful to start-up type companies who were able to take advantage of the low initial cost, combined with the ability to scale out their Cloud based infrastructures at the click of a button. Now however, as the Cloud providers’ services have expanded and matured, companies of all shapes and sizes can, and do, make use of Cloud based services.

Although the concepts touched upon here will be expanded upon in future posts it might be useful to consider the core services offered by Amazon Web Services to give some concrete examples of what is on offer.

Amazon’s virtual server service is called Elastic Compute Cloud (EC2). This service allows virtual servers to be created on an “as-needed” basis with a flexible about of memory and CPU capacity, running operating systems such as Linux and Windows. For storage Amazon offers its Simple Storage Service (S3). This service provides highly durable and cost-effective file storage that is easy to access and use. In terms of databases, Amazon has its Relational Database Service (RDS). This allows the creating of managed database services based upon common technologies such as Microsoft SQL Server, MySQL, Oracle and PostgreSQL.

The “Cloud” used to be a buzzword that many people found very easy to dismiss as a fad but it is instead growing at an amazingly rapid rate and will soon become so ubiquitous that people will struggle to remember a time without it.

This article is part of our ongoing Cloud simplified series where we seek to delve into the meaning behind widely used, but little understood concepts in Cloud computing.

Cloud is about how you do computing, not where you do computing.

Blog #2 : Is the data warehouse dead?

Big data has to be one of the most over-hyped and over-used phrases during the last few years. The ability to analyse near unimaginably large sets of data brings the promise of gaining amazing insights that were previously out of reach. As with many buzz-words it is quite difficult to find a clear definition of big data. One way of looking at it however is as data sets that are too large to be processed by traditional tools such as relational databases. If relational databases are no use for big data then what should we use instead? One of the answers is the data warehouse.

Data warehouses are essentially a form of database, but differ from more familiar relational databases in a number of significant ways. A traditional relational database is typically used for applications such as online shopping websites. They are optimised for large numbers of short transactions (imagine the number of sales taking place on a typical well-known brands website every minute) and for maintaining data integrity (vital for financial transactions).

Data warehouses on the other hand are designed facilitate data analysis. This means that they need to deal with a much lower volumes of transaction when compared with a relational database, but the queries used are often much more complex and frequently involve aggregating large amounts of data.

The major problem with data warehouses is that they are typically very expensive and time consuming to procure and configure. In an attempt to tackle this problem public Cloud providers have started to make managed data warehouse solutions available. These have the advantage of being very quick to provision (less than a day as opposed to weeks or months for an on-premises solution) and having no up-front costs. Being a managed service you also do not need to worry about applying patches and updates; and backups are typically handled automatically as well.

For example, Amazon Web Services (AWS) <https://aws.amazon.com/> provides Amazon Redshift <https://aws.amazon.com/redshift/>. This is a fully managed, Cloud based data warehouse service designed for analysis of large data sets. Redshift is a columnar <https://en.wikipedia.org/wiki/Column-oriented_DBMS> based data warehouse which can scale from a few hundred gigabytes to petabytes or more. It is perfect for loading huge amounts of data from multiple disparate sources and then analysing it. It can ingest data very efficiently using a parallel load process from AWS S3 <https://aws.amazon.com/s3/> buckets, but it also integrates well with other AWS services such as Data Pipeline <https://aws.amazon.com/datapipeline/>, Kinesis <https://aws.amazon.com/kinesis/> and Amazon Machine Learning <https://aws.amazon.com/machine-learning/>. Third-party data analysis tools such as Looker <https://looker.com/> can also easily be used in conjunction with Redshift.

Data warehouses can be a near essential tool for the analysis of large volumes of data.

Using a Cloud based solution allows you to provision a very powerful “warehouse” in a fraction of the time, and cost of a traditional on-premise solution. This makes them an obvious choice allowing you to get started with your analysis both quickly and cheaply.

Big data is like teenage sex: everyone talks about it, nobody really knows how to do it, everyone things everyone else is doing it, so everyone claims they are doing it.