What is AIOps and how can it be applied to IT operations?

It may be the latest buzzword in IT operations, but AIOps promises to save teams precious time and effort when it comes to identifying and resolving issues across their increasingly complex estates.

Tom Macaulay Sep 20th 2018
AIOps.jpg

It may be the latest buzzword in IT operations, but AIOps promises to save teams precious time and effort when it comes to identifying and resolving issues across their increasingly complex estates.

Analyst house Gartner coined the term to describe how the spectrum of AI capabilities can be applied to address IT operations challenges by automatically identifying and reacting to issues.  

A number of vendors are now producing AIOps platforms, which use machine learning technology to provide increased visibility into operations with fewer false alarms and more accurate predictive warnings. In order to be effective these platforms have to integrate with other applications via APIs to create vendor-agnostic analytics systems that interact across silos.

Their typical use cases involve post-processing of events streams from monitoring tools, bi-directional interaction with IT service management tools, and integration with automation toolsets to actually implement the insights.

How to develop an AIOps strategy

Gartner recommends implementing AIOps in phases. Early adopters typically start by applying machine learning to monitoring, operations and infrastructure data, before progressing to using deep neural networks for service and help desk automation.

Gartner research director Viv Bhalla suggests identifying the tactical and strategic uses cases that could benefit from AIOps, and then evaluating the tools and vendors that fit these needs.

"Phased approaches of AIOps tend to the most successful,” Bhalla explained at the Moogsoft AI Symposium earlier this month.

"What we’re tending to find is machine learning of event data and of structured data tends to be the lowest hanging fruit, and that's no bad thing. If you're familiar with that, use that as the entry point into embracing this technology.

"Where we're seeing the secondary phase this will evolve into is in language-orientated data, neural networks, behavioural analysis, often around the automation of service desks. What I would say is, there's no harm going for the lowest hanging fruit and then expanding from there.”

He recommends that organisations retrain their infrastructure and operations teams to use the new technology, create a centre of excellence to share ideas from different departments, and then "start small, move fast and validate quickly."

"It's not just a case of thinking 'we've installed it, job done.' It's a strategy, so you're measuring this and evolving this. You're looking for new additional use cases, insights that you may not have conceived before."

The vendor's view

Bhalla believes that off-the-shelf AIOps tools such as Splunk's IT Service Intelligence and Moogsoft AIOps have become more popular than in-house solutions because they can deliver greater consistency and faster time to value.

Moogsoft CEO Phil Tee told Computerworld UK that his company’s software addresses changing needs in IT operations.

"What's created AIOps is we’ve gone past the point where you can anticipate a priori how things are going to break and what they will do when they break,” he said.

"You need to move beyond building rules to manually search for the failure conditions, to starting to use the data that you get to define the logic that is used to look for the failure conditions. When we talk about AIOps, we talk about operations enabled with essentially the full suite of machine learning, data science, and AI techniques."

The Moogsoft AIOps platform applies purpose-built machine learning algorithms to data from across IT systems to find and fix problems. Tee believes that its greatest strengths is finding the potential failures that humans can't anticipate.

"It's not like engineers are wilfully making mistakes, but the complexity is so huge in IT that it's really impossible for the human mind to be able to encompass it and reduce it to a simple set of go to's," he said.

For example, payment processing company Worldpay turned to Moogsoft to mitigate the failures of its traditional monitoring tools by clustering events into actionable situations and identifying probable root causes, while Daimler is using it to get customer service teams the information they need from the growing volumes of data generated by its vehicles.

"AIOps is for us a solution to react to the real critical events and not on the thousands of events which are coming up from our monitoring tools,” said Rüdiger Schmidt, an IT manager in Daimler's diagnostics and connected vehicle data team.

What's next for AIOps?

Gartner predicts that by 2019, 25 percent of global enterprises will have implemented an AIOps platform that supports two or more major IT operations.

Tee thinks that the uses of AIOps will naturally increase as devices gain intelligence, systems become more complicated and data volumes grow.

"I can see the point coming where we can get to the stage of an autonomic network, this self-healing vision of the future, where there's such a weight of shared trained knowledge about how things can fail and how things can go wrong that you're actually able to automatically take remedial action," he said.

"It may be even down to the point of automatic reach-outs to the impacted customers with deep insights into when the service is going to go back online and what they can do in the meantime to live with whatever impact is there. I think AI is going to reach all the way to the customer and I think it's going to reach all the way back to the device."