Faced with the pressure to ensure the constant availability of many digital services for employees and customers, many IT workers are at high risk of burnout. Short for artificial intelligence (AI) for IT operations, AIOps could reduce that stress by automating and simplifying complex and time-intensive tasks.
“IT operations cannot sustain the current model. In fact, it’s been slipping further behind business operations for far too long. The volume of data is growing exponentially, and if IT operations don’t make a revolutionary leap forward, the industry will see lots more business systems failing to meet customer and employee needs,” says Carlos Casanova, principal analyst at market research company Forrester.
“AIOps [can help by] enhancing human judgment and making a leap towards a proactive and predictive operational footing set atop an AI or machine learning and data science foundation,” he adds.
The primary driver for AIOps adoption is gaining increased insights. “Those insights can be very broad and can have slightly different focuses, depending on the organisation. The common thread, however, is a deeper understanding of what is going on with the entire system (i.e., from the backend through infrastructure/cloud and out to the end-user). With this deep understanding, many things can be done, actions taken, and efficiencies realised,” Casanova explains.
Optimising IT operations
According to Colin Tan, general manager and technology leader for Singapore at tech giant IBM, AIOps can help optimise IT operations in four ways.
See also: Conducting secure data movements in the cloud symphony
Firstly, AIOps can enable proactive incident resolution. By cutting through IT operations noise and correlating operations data from multiple IT environments, AIOps can help identify root causes and propose solutions faster and more accurately than previously possible.
“Moreover, AIOps will keep getting better at identifying less urgent alerts or signals that correlate with more urgent situations as it never stops learning. This means it can provide predictive alerts that let IT teams address potential problems before they lead to slowdowns or downtimes that are estimated to cost upwards of hundreds of dollars per hour,” says Tan.
Secondly, organisations can better optimise their resources with AIOps. Tan shares that modern applications are often separated by multiple layers of abstraction, making it difficult to understand which underlying physical server, storage, and networking resources are supporting which applications. Traditional methods of forecasting resource requirements based on ‘set it and forget it’ estimates often lead to overallocation of resources, or worse yet, starve applications during usage spikes.
See also: 80% of AI projects are projected to fail. Here's how it doesn't have to be this way
“AI can be used to assure system and application performance continuously and automatically while safely reducing costs and increasing efficiency. With data centres emerging as the fastest growing source of greenhouse gas emissions, the ability to control network performance and direct bandwidth more efficiently to where it’s really needed also plays an increasingly important role in companies’ sustainability goals,” he claims.
Thirdly, AIOps can help enhance observability across the enterprise.
Businesses are rapidly adopting modern development practices, and as a result, they are bringing more services to market faster than ever. That means complexity is skyrocketing because applications are being deployed so frequently and in so many different locations that traditional application performance monitoring can’t keep up.
Colin Tan, general manager and technology leader for Singapore, IBM
By turning to observability, organisations can aggregate, correlate, and analyse a steady stream of performance data from distributed applications and the hardware they run on, to monitor, troubleshoot, and debug their applications more effectively. AI and automation will also enable teams to identify and fix issues in new code as well as predict application issues based on system outputs.
Finally, organisations can leverage AIOps to improve the performance and efficiency of IT teams. “AIOps presents an enormous opportunity to replace disparate, manual IT operations tools with a single, intelligent, and automated IT operations platform. The more an AIOps system learns and automates, the more it helps ‘keep the lights on’ with less human effort through identifying potential issues before an outage occurs. IT operations team can therefore focus on tasks with greater strategic value to the business,” says Tan.
Not a panacea
To stay ahead of the latest tech trends, click here for DigitalEdge Section
Despite its benefits, AIOps is not a panacea for all that ails IT operations, and it can introduce challenges. For instance, AIOps can help detect issues in advance, but it requires sufficient data. “It’s critical to bring in at least one month’s worth of data from all sources in the environment for any machine learning models of AIOps to forecast issues with accuracy,” states Dhiraj Goklani, vice-president, observability, APAC at Splunk, a provider of data platform for security and observability.
AIOps is also not plug-and-play. Goklani explains: “It is still a tool that needs to be programmed and monitored. The IT team will need to put in the hard work at the beginning of the process and make sure they are ingesting high-quality data, spotting inaccurate conclusions and reviewing automated response workflows. Thereafter, companies will still need a team to monitor the data that is fed into the platform, understand the cruciality of the applications and systems and ensure the efficiency and effectiveness of automated workflows.”
[Instead of replacing humans,] AIOps is designed to augment the humans and enable them to switch their attention to higher value tasks with better efficiency and creativity. It enables humans to interpret data faster by detecting anomalies, highlighting patterns or trends, and possibly finding root causes. But it is still a long way off for machines to make autonomous decisions as it requires advances in observability, automatic remediation, and workflow automation
Sathyan Sethumadhavan, AI architect, Thoughtworks
Besides that, not all AIOps tools are the same. “Some tools might look like a black box, leaving users clueless about how their machine learning functions and makes correlations, while others require significant time and practice for training, configuration, and onboarding,” warns IBM’s Tan.
AIOps strategy
Using AIOps to automate and enhance IT operations can reduce customer-facing outages by up to 50% and mean time to recovery (i.e., the average time to recover from any failure) by up to 95%, according to a July 2021 Forrester study commissioned by IBM.
However, simply deploying AIOps tools will not guarantee those benefits. AIOps, asserts Forrester’s Casanova, calls for the transformation of work.
AIOps is not about improving technology or even response times. It’s about translating these technological or operational improvements into positive business outcomes.
Carlos Casanova, principal analyst, Forrester
“For example, upskilling employees with new data science skills enables them to comprehend complex environments better. It gives them the opportunity to use their education more than their hands. It gives them a greater sense of worth. It helps them see that what they do daily is helping the company grow and prosper. This translates into happier employees and lower employee turnover. Happy employees are more productive, committed to their employer, and better all-around employees. Every aspect of the business benefits from this,” he explains.
Thoughtworks’s Sethumadhavan adds that a successful AIOps strategy should cover three key areas:
- Organisational capability for scale and AI governance
- Operational competencies for lifecycle management
- Analytics platform modernisation for technological advancements
He continues: “Additionally, stakeholders throughout the organisation need to be informed about the transformation process, the data-driven mindset, AIOps as an enabler, and a roadmap for upskilling. That way, the symphony between different departments and individuals is clearly understood. The implementation strategy should begin by identifying high value, less complex, and low effort problem statements along with a roadmap for continuous improvement.”
Meanwhile, Splunk’s Goklani emphasises the need to consolidate their data. “AIOps uses AI algorithms to automatically group notable events based on their similarity. Organisations can, therefore, fully benefit from AIOps by using AI and machine learning to bring data sources together, break data silos and allow IT teams a chance to have a more complete analysis and insights into systems. [That way,] IT teams can find and fix complicated issues more quickly and efficiently, saving time, money, and resources.”
Future of IT operations
Maintaining and improving IT operations is anything but simple when faced with current challenges like labour shortages, geographic disruptions, demand spikes and increasing cybersecurity threats. AIOps will therefore be the future of IT operations due to its ability to help IT teams predict and prevent incidents before they can even happen.
AIOps is a practical and readily available way to help organisations grow and scale IT operations to meet future challenges by improving responsiveness, streamlining complex operations and increasing the productivity of the IT staff. [This is why Gartner projects] AIOps to grow to a market size of about US$2.1 billion in 2025 at a CAGR of around 19%.
Dhiraj Goklani, vice-president, observability, APAC, Splunk
IBM’s Tan adds: “Not only do AIOps systems ensure that IT service issues are resolved promptly, but they also provide a safety net for IT operation teams. [They can help] address issues that may fall through the cracks due to human oversight, such as organisational silos, under-resourced teams and more. We will continue to see the acceleration of automation adoption and expansion of AIOps, leading to more meaningful applications in different industries.”
Echoing Tan’s sentiment, Thoughtworks’s Sethumadhavan says: “AIOps today is mostly focused on detection, root cause analysis and recommendations. The future AIOps solutions would focus on data fabric such as data with automation, lineage, security, governance and access, as well as observability, edge computing and hyper-automation. This would pave the path towards actionable AI insights, mitigate risks with explainability and scale operations to support business agility.”