How do you weigh up the options to choose between EC2, Lambda, ECS, Fargate, EKS, Batch and numerous alternatives? This choice can seem really hard. It’s not made any easier by our typical decision-making habits. Often, we debate our opinions on technology options and force through choices that are more based on our existing biases and recent successes than looking at the facts and using a simple guiding methodology.
This post details factors to consider and factors that can be avoided when choosing compute services. Simplicity is the guiding principle driving the decision-making process. It will:
- Reduce time spent making decisions
- Help everyone to understand and buy in to technology decisions
- Keep meetings focused, efficient and effective
In the spirit of simplicity, many other compute options are not included here. Specifically, data-focused services like AWS Glue, Kinesis Data Analytics, Elastic Map Reduce and any database with a SQL interface should all be considered as valid compute options! These services will be covered in a future post.
Factors that Matter Most
- Direct business value. Don’t invest in building or maintaining anything that has already become a commodity. Building anything that takes time but does not directly yield unique business value is a burden, a cost and a debt that has to be paid off in the future. If you are in the business of selling motor insurance but have teams of people maintaining networks, clusters and other supporting infrastructure, ask how you can improve and re-focus the team on motor insurance products and features.
- Simplicity. The geek in us wants to build things from scratch. We are drawn to solutions with lots of configuration options - elaborate frameworks that we can own and hold up as monuments to both our cleverness and our firefighting skills. In the end, the smartest option is always the simplest. The simplest solution is when you have to configure nothing and write as little code as possible.
- Scalability. Most technology options these days claim scalability but that can mean any of a number of things. Ask yourself these questions:
- How much scalability do we need? Only spend effort on this factor if it matters. Many workloads are predictable and limited so don’t overthink scalability where it doesn’t apply.
- How fast and how often should it scale up and down?
- How does scaling work in production compare to in development? The responsiveness of scaling in production is sometimes paradoxically less important than in development where you are iterating frequently and speed is important. If you are waiting minutes for a cluster to rejig itself to host your single-line code change, you might have a problem.
- Cost of maintenance. This is closely related to simplicity but there are subtle differences. Some technology is simple to adopt but more costly to maintain as time goes on. This factor should reflect how easy it is to deploy software, to iterate on change. It should also include the cost of maintaining the infrastructure itself, including the maintenance of aspects like security, network and storage.
- Evolvability. How comfortable are you with changing or adapting your decision shortly after you have made it? What if things just don’t turn out as well in practice when the production workload kicks in or when users’ needs change? The term Evolvability was used by Werner Vogels last year to describe how AWS build their own products. For your system to be evolvable, the mindset of your organisation, the architecture and the technologies you use must all be amenable to change. Data-driven decision-making only works if you measure effectiveness afterwards and have the potential to switch. Switching is made easier if you go for the simplest option first as you don’t have to worry about the sunk cost of complex infrastructure.
Simplicity is the Trend
This image shows the evolution of commodity computing in the context of AWS. The axes, value and evolution, are borrowed from Wardley Maps. We have largely accepted that it is no longer wise for organisations to invest time and money in building and maintaining physical hardware. So too for virtualisation infrastructure, the next step in the evolution. More recently, we moved to containers and the complex orchestration systems that became necessary to run containers reliably or at scale. This too has been commoditised by cloud vendors.
Factors that Matter Less
Avoid the risk of introducing phantom factors into the equation. Things that often fall into this category are cost, vendor lock-in and skills. The way to eliminate these factors by addressing them up front and measuring whether they matter or not.
- Cost is frequently cited as a reason to choose one technology or architectural pattern over another. This is of course justifiable but I often see the time spent debating the cost of a service far exceed the potential cost of the service itself! If you bring cost into the conversation you should always include the cost of time and people and make a full comparison.
- Vendor lock-in is similar. All technology choices involve lock-in to something - a framework, programming language or some database. If this is a real concern, assess the risk and the potential cost of moving vendors. Then, include this in your data-driven decision process. In the vast majority of cases, the risk is so low that you can ignore it.
- Skills suitability. Don’t underestimate the ability of your organisation to adopt new technologies and skills. Sure, it takes time and there is always a learning curve. In general, when the case is made clear and people are given the opportunity to learn, they will grab it with both hands. If your company’s technology is driven by input from the teams leading to data-backed decisions, people will accept it and be driven to make it happen.
Measuring simplicity isn’t an exact science. Further down, I’ll provide a guide to AWS services with a number for simplicity based on our own knowledge and experience. The important thing is that you can start to put numbers on these factors and rank them. The numbers can always be adapted as more information becomes available. These rankings allow you to make quick, clear decisions using the following flow.
Scoring AWS Compute Services
By using a simple scoring system, decision-making becomes faster and much less controversial. It protects against the effects of loud voices, personal opinions and bike-shedding. In our scoring system, we include scaling characteristics, important limits and unique features. Each of the compute options in AWS comes with a large amount of documentation on features, quotas, limits and options. Here, we present the essentials to enable quick decisions so you can deploy quickly, measure, learn and adapt. The full table is available as a downloadable PDF here.
While this table packs in a lot of considerations, it’s designed to gather and distill a lot of information that is already available but hard to trawl through. It’s fine to tweak some numbers based on your experience.
Start with Simple
The message here can be reduced to one principle with a major impact on the ability for you to deliver repeatedly on business outcomes - start with simple. The decision-making process can itself be simplified by reducing factors to an uncomplicated scoring system to facilitate clear choices everyone can understand. It is also designed to work well in organisations that value a learning approach, measuring and adjusting as part of a continuous improvement process. Can you make this process work in your organisation? What’s holding you back? How will you adapt it to foster quick, clear decisions?