The cloud’s lock-in vs. agility trade-off
Deloitte on Cloud Blog
Many enterprises have the proverbial bad taste in their mouths from years of vendor lock-in on mainframes and database technologies. As they look to the cloud, they don't want to fall into the same trap.
March 12, 2018
A blog post by Mike Kavis, managing director, Deloitte Consulting LLP.
Meanwhile, some cloud service providers (CSPs) are offering a suite of proprietary application programming interfaces (APIs) that offer agility by providing infrastructure as a service (IaaS) and platform as a service (PaaS) capabilities. The higher up the CSP’s stack you go, the more agility you can achieve, but the greater the lock-in risks.
When creating a cloud strategy, enterprises should consider weighing the trade-offs of lock-in vs. agility. Too much focus on lock-in can result in the cloud becoming just a virtual data center. Too much focus on agility can result in a long-term marriage with a vendor and reduced leverage. For some enterprises, the ends of the spectrum are satisfactory, but for many, the answer lies somewhere in the middle.
Over the past several years, I have witnessed the emergence of a few common patterns:
- When agility is the priority, many companies go “all-in” and embrace the entire CSP’s service catalog.
- When risk management is the priority, many companies stick to the basic APIs for network, compute, and storage and stay away from higher-level APIs like database as a service, containers as a service, etc.
- Cloud maturity often dictates how far up the stack the enterprise is willing to go. In these early days of cloud adoption, many enterprises are still in a data center mindset and focus primarily on IaaS solutions (network, compute, storage). As they gain experience in the cloud, they start moving up the stack to PaaS solutions.
- If hybrid or multi-cloud strategies are top priorities, companies usually try to limit the number of services in the service catalog they will use so their software can be portable across clouds.
Many enterprises that go all-in for the public cloud leverage high levels of abstraction provided to them by the CSP. If their goal is agility, they should reconsider spending time managing infrastructure, middleware, and other stack components that typically provide minimal business value. Instead, they could consider leveraging the CSP’s core competencies around managed services, security, scalability, high availability, and other high-level services.
Why spend time managing databases across multiple virtual data centers when it can be done for you? Why deal with the complexities of containers and orchestration engines when it can be abstracted to where it is nothing more than a configuration exercise? Why spend months training personnel on complex machine learning models when there are easy-to-use APIs that can do it for you?
When speed is of upmost importance, consider going all-in and embrace the full stack of your favorite public cloud provider(s).
- Speed to market: Shave months off implementation time while creating less technical debt to manage over the long haul.
- Managed services: Focus on what is core to your business and leave the plumbing to the plumbers.
- Integration/consistency: Integrate each service effectively with the rest of the CSP’s service catalog, which helps ensure that important services such as identity and access management (IAM), logging, and monitoring are consistent across the entire stack.
- Innovation curve: Leverage the rapid rate of innovation and the R&D investments of the CSPs, which release new features and services almost daily.
- Lock-in: The more services you embrace, the harder and more expensive it is to pivot or port to another CSP.
- Reduced leverage: The more connected you are with the CSP, the less leverage you have with contract negotiations, pricing, and prioritization.
- Support dependency for outages: Although CSPs typically have an excellent track record with performance and service-level agreements, the more APIs you use, the more you are dependent on the CSP’s uptime.
Risk-adverse companies often shy away from lock-in and tend to use only the basic network, compute, and storage APIs of public cloud providers. In essence, they are running a data center in the cloud and do not take advantage of more powerful APIs like managed database services that can do a lot of the heavy lifting for them. They may achieve some agility improvements around provisioning time, but developers generally are not able to leverage APIs that could make a huge difference in their delivery times.
- Limited lock-in: Less dependence on a single vendor and more options to switch providers.
- Leverage: More negotiation power.
- More control: More control over the architecture and uptime/downtime.
- Limited agility gains: Minimal speed-to-market gains and an inability to leverage higher-level services.
- Still managing a “data center”: Using only the foundational APIs simply gives you a virtual data center. You still have to perform many of the functions done in a physical data center and developers are usually provided with very little tooling.
- Limited, if any, cost savings: This is due to the lack of reduction of non-value-add tasks and an inability to take advantage of auto scaling APIs such as database management as a service, streaming as a service, etc.
The hybrid approach can provide the best of both worlds: agility and reduced lock-in. However, this greatly increases the architecture’s complexity and is much harder to achieve. There are two types of hybrid approaches.
One approach is to look at each workload and determine which CSP is best suited to run that workload. The other approach is to make a single workload run on any CSP. The first approach is more attainable. You will likely pay a one-time tax to set up the proper guardrails on each CSP, but you can still leverage much of the CSP’s stack. The second approach is much more involved. Each application will need to be architected to be completely portable, which limits how much of the CSP’s stack you can leverage.
- Balance between lock-in and agility: This allows you to increase portability capabilities and provides the ability—theoretically—to switch providers.
- Leverage: Let the CSPs fight it out for the most workloads, which provides you with better negotiation leverage for terms and prices.
- Best cloud for each workload: Use the cloud with the best capabilities for each workload type.
- Increased complexity of design and operations: It better be worth it, because it is a big investment in architecture, development, testing, and operations to pull off a hybrid approach. Certain features like IAM are proprietary on each platform and will essentially have to be redone for each CSP.
- More third-party tools required: To remain as agnostic as possible, you will not be able to leverage certain CSP-native APIs and will need to buy and manage various third-party solutions or, worse, roll your own (see complexity above). Each third-party solution requires additional integration points for each cloud and does not inherit many of the CSP-native APIs such as logging, monitoring, and security.
- Mismatch of features and feature robustness across clouds: Vendors prioritize their road map based on customer demand. Historically, vendors often start on Amazon Web Services, then the Microsoft Azure Cloud, then Google. It is rare to have feature parity for a given third-party solution across the various cloud providers.
Strategy for balancing lock-in vs. agility
For enterprises that want significant agility while reducing their lock-in risks, consider the following guiding principles:
- Stay away from CSPs’ build-and-deploy tools and use cloud-agnostic solutions. That way, your continuous integration and delivery (CI/CD) pipelines can be configured to deploy to any cloud endpoint.
- Keep the operating system build and patching processes agnostic of the CSP’s services (as much as possible).
- Build loosely coupled systems, preferably using microservices when possible.
- Develop a logging and monitoring framework that can feed any network operation center (NOC). Don’t be afraid to use the CSP’s proprietary monitoring and logging solutions to supplement your other tools as long as you have a standard framework and process flow for monitoring, logging, event management, incident management, etc. that all feed into an enterprise service desk solution.
- Embrace all other CSP APIs to increase agility. APIs that focus on database as a service, Internet of Things (IoT), machine learning, artificial intelligence, and others may eliminate significant amounts of effort required to architect and deploy solutions, not to mention ongoing support costs.
The trade-off decision should be considered for every distinct API. For example, some companies may shy away from using the CSP’s container management APIs because they want a portable container solution. However, if you believe, like I do, that container orchestration engines such as Kuberenetes are becoming a commodity that can be easily abstracted on any cloud endpoint, then you might choose to embrace the CSP’s container management APIs.
A strategy for balancing lock-in vs. agility should be part of every cloud strategy. But this strategy should not be chosen using only an IT lens. IT is just one stakeholder in the equation. The business is another important stakeholder and its needs should be weighed at least as much as IT’s. The more you embrace the CSP’s stacks, the more agility you should be able to achieve. The less you embrace the CSP’s stacks, the fewer potential benefits you may get from your cloud strategy. There is no right or wrong answer, but in the age of “speed wins,” where many industries are being disrupted right before our eyes, think twice if your scale is tipping heavily away from the agility side.
Interested in exploring more on cloud?