Cloud and AI innovation in operations


Cloud and AI innovation in operations

Cloud migration & operation

Can AI-enabled processes and a product-centric model help Ops keep pace with transformation?

A blog post Mike Kavis, chief cloud architect, Deloitte Consulting LLP

Ops is undergoing a major transformation

The pace of innovation is moving at a speed never seen before. IT leaders are challenged to spend less while being asked to implement new technologies and concepts like cloud, DevOps, machine learning (ML), artificial intelligence (AI), IoT and many others. To keep up with the pace of change, organizations must transform the way they work.

Cloud and AI innovation in operations

When I look at articles on IT innovation, most of what I find are about how organizations are changing the way they build software, or how they engage with customers. And yet operations is also radically changing–with a lot less fanfare. In the this post I’ll cover two areas where I see major change happening in operations: Cloud Platforms and AIOps.

Cloud Platforms

Not only is the pace of change moving fast, but the speed at which many development teams are bringing products and services to market is accelerating. Embracing a DevOps mindset has led to teams changing the way they think about software delivery. In my view, continuous integration and continuous deployment (CI/CD) have become the standard way of building applications. In these CI/CD pipelines teams are embedding a lot of security and governance controls which can reduce or even eliminate many meetings such as the CAB (control advisory board) thus drastically reducing the time to market. Just this week I saw a checklist that showed the old waterfall checklist that had 17 days between the filling out the “release to production” form and the actual deployment. Many companies have moved far beyond those legacy checklists processes and can deploy code weekly, daily or even multiple times per day.

So how does Ops keep up with these new technologies coming in while trying to manage the increased frequency of deployments at the same time? The answer is they are moving away from being ticket takers and towards becoming platform providers.

Many platform teams have shifted from being the executors of all things ops to building products and services to enable builders to run what they build. The platform approach to the cloud is to automate the organization’s controls and policies by building a layer of “guardrails” on top of the chosen cloud provider(s) and provide cloud services to the builders in a safe and compliant environment. The platform provides access to logging and monitoring tools so that the builders can troubleshoot their own apps without having to logon to any infrastructure. In this model, Operations can now focus solely on the cloud platform while builders no longer throw their stuff over the wall to the ops team. The ops team is now part of a product team, where the product is the cloud platform. The platform team is now an internal cloud service provider and must change their mindset to act as such (see shared responsibility model below). Their customers are the builders and their focus should be enabling the builders with tools to build and run their apps. At the same time, Ops is responsible for the SLOs and SLIs of the cloud platform.

Ops also owns vendor management of the cloud platform in this approach. Examples of vendors they manage are:

  • Cloud service providers 
  • Logging and monitoring vendors
  • Security and compliance vendors
  • CI/CD toolchain vendors (if the platform owns CI/CD centrally)
  • And all other 3rd party solutions that make up the platform offering.

Moving from operators to product specialists can be a drastic culture change. But this is the future of cloud.


I consider this another horrible buzzword that marketers use to mean anything they want to sell everything they have. Gartner defines AIOps as the combination of “big data and machine learning to automate IT operations processes, including event correlation, anomaly detection and causality determination”. I spent some time trying to visualize what AIOps is and came up with the following diagram below.

Regardless of how we define the term, the point is the world is moving towards intelligent operations that leverages bots, image and voice recognition, ML, AI, and other emerging technologies. Many of the clients I have worked with over the years are still in the reactive operations or proactive operations mode as described in the image above. In my opinion, staying in those modes is not sustainable if your company is accelerating the rate of deployment to multiple times day. Throwing more headcount at the problem is not a viable long-term strategy. Organizations should look to more automation, especially to assist in the self-healing of systems.

Another driver of AIOps is an uptick of new products and services that rely on devices or compute power out on the edge. In these scenarios, there is much less control over the stack than in datacenters and even in clouds. Whether these devices are part of solutions in connected health care, smart cities, drone surveillance or smart factories, organizations should embrace more intelligent operations for those things that are “out in the wild” and can’t be easily serviced.

AIOps isn’t just for IoT use cases though. The adoption of cloud has driven up complexity within the enterprise. Many enterprises have hybrid or multi-cloud implementations (or both) making it incredibly challenging to manage their infrastructure sprawl. In addition, cloud native architectures are highly distributed and elastic which creates new operational challenges. Bolt on newer building blocks like containers, microservices, and serverless and our previous methods of operating systems just doesn’t cut the mustard any more. Operators are leaning on intelligent tools to help them manage these complex systems.

Many enterprises are going through a major transformation as they try to become more “digital”. Technologies like cloud computing, AI, ML, robotic process automation (RPA), IoT and others are radically changing the way we build and run products. While much of the press focuses on the impact to customers and developers, Ops is going through a major transformation as well. Some people in Ops may see this as a threat, but I believe this is the golden age of operations. Operations should be first class citizens sitting at the table in sprint zero of each product road map helping architect an end to end solution that is resilient and scalable. Greenfield systems should be designed with operations in mind.

To get there, IT should shift towards a more product centric model where full stacks teams build and run their systems on top of their hardened and managed cloud platform(s). Ops plays a major role both in the platform services and with the product itself and it is time to get out of reactive mode and join the party.

Insert CSS fragment. Do not delete! This box/component contains code needed on this page. This message will not be visible when page is activated.

Please visit our cloud services webpage and discover a full suite of service offerings and capabilities available to accompany you throughout your Cloud journey.

Did you find this useful?