AWS Well-Architected Framework and a Cloud Ready System

Upendra Kumarage
8 min readJun 2, 2020

Everyone is talking about cloud today, migration to a cloud service, deploying a new application in cloud, use cloud based storage solution, compute solutions, API solutions and so forth. Individuals and Organizations use the provided services by Cloud Service Providers for various purposes starting from using simple storage solutions to mission critical system deployments. In these hasty deployments, I have seen instances where the resources and services provided by the Cloud Service Providers are not being used optimally and efficiently. This could be due to lack of awareness of resources and their usage, estimation errors, unplanned operational structure and not cloud ready deployment architectures.

Thinking of the above mentioned shortcomings and guidelines provided by Cloud Service Providers, I think that the AWS well-Architected Framework is a good place to start a cloud based deployment architecture planning. Whatever the Cloud Service Provider you choose, AWS well-Architected Framework provides a general insight into a Cloud based deployment. GCP also has a similar set of guidelines which you can find as Principles for Cloud Native Architecture. I have very less experience in Azure platform so I will not be going discuss about the Azure platform here.

So, what is AWS Well Architected Framework? As per their introduction

The AWS Well-Architected Framework helps you understand the pros and cons of decisions you make while building systems on AWS

This framework provide an insight to major factors which you need to focus on developing and deploying your solution on AWS. But as mentioned earlier, this framework generally can be referred as a guideline to plan our deployment in a cloud itself. AWS Well-Architected Framework is built upon five pillars where each comes with its own design principles and best practices. The five pillars are namely,

1. Operational Excellence

2. Security

3. Reliability

4. Performance Efficiency

5. Cost Optimization

These five pillars have been proven as the foundation of any reliable, secure, efficient, and cost-effective system deployed in the cloud. Let’s take a brief look into these five pillars and their overall applicability in building a cloud ready system.

Operational Excellence

The ultimate goal of our system is being deployed in a Cloud Service is the operation excellence which in return generate the business value of the system. The Operation Excellence pillar explains how we can generate the intended business value through monitoring the system to continually improve while supporting processes and procedures. I am not going to do a detail explanation of the six design principles of Operational Excellence here as it would lengthen the article and also it would be rewriting what AWS has already written in their white papers.

So, how do we suppose to achieve the Operational Excellence? It is always good to start with coding. Always avoid the manual creation of resources. We are deploying our system in cloud. Unlike in a traditional on premise setup, we have the flexibility to maintain our entire setup as a code. We will be having a self-documented and versioned code base of our entire deployment setup and this will aid in making any future changes more efficiently while providing an easy rollback process.

Other important thing we have to take in to account is that, our design should be flexible to changes. It is suggested in the AWS documentation on doing small changes to the setup on a frequent manner that can be reversed if required. This way, our entire setup will be up to date and grow effortlessly.

The most important thing in a cloud based system deployment is anticipating failure scenarios. This is very critical. I have seen occasions where an entire system went down for several hours due to automatic infrastructure upgrade being pushed by the Cloud Service Provider. In the cloud, we do not have the control over underlying infrastructure. We have seen plenty of time permanent loss of data due to data center failures. So, we need to plan to avoid such scenario. We can use tools like Chaos Monkey [4] to do the resilience testing. This will keep us informed of potential failures and impact to the system.

Finally, we will have to plan the overall operations procedure for the deployment. This ought to be a well-established and well documented process including who owns what, what would be the escalation points and so forth. It is required be prepared and evolve over the time. It would be a total waste of resources if we do not plan for evolve as the cloud Service provider definitely will and we will be end up on sitting on outdated resources which are set to be terminated.

Security

Security is one the key aspect we have to consider when planning our deployment architecture. We should be able to protect information, System and other assets in our deployment. Any vulnerability can be exploited and may result in critical incidents.

Security can be divided in to two sections, one is the Security of the System itself, and the other one is the Security of the deployment. The security of the deployment should be started from privilege management enforcing the principle of least privilege and separation of duties with appropriate authorization to the cloud resources. Both AWS as well as GCP has enhanced Identity Access Management Features to implement this.

Traceability is the next important thing in Security planning. We have to ensure that every action is traceable and accountable. This would provide an audit mechanism and the visibility to actions performed in our cloud resources.

In most of the cloud services, we can apply the security at several layers such and VPC level, Subnet Level, Storage level, Computing resource level and so forth. It is advised to plan your security features along with these layers rather than pointing a single layer.

When it comes to the protection of the data, Cloud Service providers gives us extended ways to protect our data including encryption mechanisms, tokenization and access control mechanisms. We need to define the level of security our data needs and use the appropriate mechanism to protect these data.

Finally, it is always good to prepare your system for any security incident. We need to have a properly designed incident management process along with escalation matrix to incident response.

Reliability

The definition given to Reliability by AWS is,

The ability of a system to recover from infrastructure or service disruptions, dynamically acquire computing resources to meet demand, and mitigate disruptions such as misconfigurations or transient network issues

I added the definition given directly here as it is the most accurate and simplest explanation I have seen for Reliability.

Unlike a traditional on premise deployment, in cloud we have the ability to scale our resources horizontally on demand as well as provide high availability effortlessly. What we need to do is applying these capabilities to our deployment architecture.

Implementing Reliability in our deployment should be started from stopping the guesses we make about the resources. We can monitor or predict the demand and utilization of our resources and launch our system. When the demand increases we can scale up our resources. In parallel, we can make sure that the high availability is made along with the estimated resources. In both AWS and GCP we have multiple Availability Zones for a particular region which we can make use to assure the high availability of a resource.

As a final note to Reliability, we need to ensure that when a failure is occurred, there is an automatic mechanism to recover it. We can make use the available recovery and backup mechanism to address such case. Unlike on premise deployment, we can test how system fails and validate our recovery procedures.

Performance Efficiency

Here also, I would like to directly quote the definition provided by the AWS,

The ability to use computing resources efficiently to meet system requirements, and to maintain that efficiency as demand changes and technologies evolve

Here, it is discussed about the optimal use of performance of our cloud resources. This cloud be varied from computing resources to serverless architectural deployments.

To implement the performance efficiency, we can pool through the enormous amount of advanced services provide by the Cloud Service Providers rather than deploying your system in the cloud in a traditional way. What we need to do is plan our application to utilize these resources. This will definitely aid in evolving our system over the time as we are deploying the system as cloud native using the advanced services provided by Cloud Service Providers.

We can go through the service such as IoT, Managed Container Services, Machine Learning and so forth and plan our deployment to utilize these resources rather than traditional computing resources.

Cost Optimization

Cost optimization pillar is the last of five pillars of AWS well-Architected Framework. The main reason an organization migrating to a Cloud Service is to reduce their operational and resource cost of data centers. The Cost pillar provides set of principles for run our system at the lowest price point while providing the optimal output.

We should start with right resource estimation. This will eliminate any extra costs and pay for only the resources we require. The usage will be increased or decreased depending on the business requirements. We can estimate resource usage using methods like load tests and performance testing. Techniques like applying life cycle policies to storage, reduce using block storage but store your data in cloud storage such as S3 and GCP cloud storage will dramatically reduce the cost.

The cost forecast and resource usage reports are provided by Cloud Service Providers. Their content might very, but still we can optimize cost periodically referring these reports. The cost optimization is not a one day task itself but a task run along with the lifetime of our deployment. This intervened with rest of the four pillars. The most optimal resource we chosen for our deployment will effortlessly become a value addition to our cost optimization process.

A Final Note

As I have mentioned in the beginning of this article, AWS Well-Architected framework can be used as a guide line or reference in planning any Cloud Based System Deployment. The vast amount of information provided in the white papers are a treasure trove if we want to gain further knowledge. Here, I have tried to provide a brief overview to AWS Well-Architected framework and general usability of it in planning a cloud based deployment. As a final note, I would like to add that the above mentioned five points should be reviewed in a constant manner, it is not an overnight task but a process run along with the lifetime of the deployment.

References

[1] https://aws.amazon.com/architecture/well-architected/

[2] https://aws.amazon.com/blogs/apn/the-5-pillars-of-the-aws-well-architected-framework/

[3] https://cloud.google.com/blog/products/application-development/5-principles-for-cloud-native-architecture-what-it-is-and-how-to-master-it

[4] https://netflix.github.io/chaosmonkey/

[5] https://d1.awsstatic.com/whitepapers/architecture/AWS_Well-Architected_Framework.pdf

--

--

Upendra Kumarage

Cloud & DevOps enthusiast, Cloud Operations professional