Digital platforms are increasingly becoming the core strategy of enterprises to better integrate customers, partners and suppliers in a marketplace to scale demand and supply. In addition, internal IT ecosystems are digitizing the core with platform and micro services-based approaches for agility.
Platforms, therefore, are becoming a significant representation of corporate brands, and user experience will be a key success factor. Platforms have an increasing need to have consistent high performance and stability.
Downtimes, poor responsiveness, inconsistent data have and will increasingly deliver losses to the tune of hundreds of millions in the new ‘platform-itized’ world. Inability to make real-time decisions and respond effectively to events will be disruptive to business.
Given this criticality, AI enabled operations (AIOps) is a trend that needs to be adopted by enterprises with considerable urgency. Self-monitoring, managing and healing platforms will prove to be a strategic differentiator, and the power of AIOps today has the capability to deliver this effectively. This needs to become a core part of every platform architecture and design.
Self-healing demystified
'Self-healing' as the term indicates, is the intelligence of the platform to detect anomalies, before or on-time, that will take place and be able to self-initiate corrective actions. Self-healing solutions need to instrument monitoring mechanisms to detect anomalies to defined KPI ranges, degrading performance and infrastructure, platform and application failures. Thereafter, data from these instrumentations are monitored for trends and signals leading to anomalies. Self-healing frameworks leverage this to determine next best actions. Next best actions can be determined based on rules derived from prior experiences and/or machine learned from incident and platform logs information. Ability to correctly determine this next best action is the core intelligence in self-healing platforms. Based on this, appropriate response is initiated which includes but not limited to notifications, re-try's, capacity expansion, invoking exception responses, application of patches etc. Below diagram depicts this solution framework.
Typical construct of self-healing solutions
Typical responses of self-healing solutions
Self-healing solutions can respond to anomalies in various ways depending on the circumstances and the confidence on the anomaly based on prior events. Some indicative ones are as follows:
Self-healing opportunities for data platforms
Big data platforms, data lakes, massive data warehouses play a crucial role in the digital journey of enterprises. Availability of data is a precursor to modern ways of working and below are some of the unique opportunities for data platforms to be responsive to anomalies and self-heal themselves:
In summary
Self-healing strategies will be an imperative for next generation maturity of platforms. AI will play an increasing role in such platform engineering, taking away the need for expert engineers to spend time on routine effort to keep lights on, crucial dollar savings that will be leveraged to deepen digital transformations.
Arunabha Mookerjea is a specialist leader and distinguished cloud architect in the Strategy and Analytics practice at Deloitte Consulting India Private Limited. He specializes in technology advisory, solution architectures and directing large scale delivery in next generation areas of cloud and big data platforms, IoT, micro-services and digital core solutions. Arunabha is a member of the Next Gen Architecture Program.