
Artificial Intelligence is finding its way into many corners of the IT world. In particular, it is making a name for itself in operations. The ability of Artificial Intelligence to monitor and make sense of data streams makes it a natural tool to use in monitoring and managing IT environments.
How does AIOps fit in?
Today’s IT operational infrastructures are migrating from the management of physical systems to using self-configuring, self-healing, and self-managing Software Defined Networks. Management of the new environment demands equally dynamic technologies and processes.
It is easily seen that IT Service Management (“ITSM”) has already reached a position of maturity. The line between ITSM and IT Operations Management (“ITOM”) is becoming increasingly blurred. An enabler of that convergence is machine learning and analytics, a natural home for AI applications.
The next step is to use machine learning and data analytics to create automated workflows that the IT Support team can use to optimize their workload.
What is AIOps?
Industry observers consider that managing an IT environment is needed at three levels:
-
Systems. The core of an AI-enabled environment is a suite of core systems, modular in nature, and called on when needed. They must also be capable of operating in campus and multi-site environments.
-
Data. The AI-enabled environment will receive a complex mix of information from the underlying systems – logs, metrics, alerts, among others. It will be high-volume and in a variety of different formats. Some elements will also need immediate processing, for example, to use network traffic analyses to quickly identify a possible DDos attack and initiate counter-measures.
-
Tools. The outermost layer in the AI environment is the tools used to process data and systems. AI systems are already available in the marketplace. When selecting a portfolio of tools, be aware that many have limited functionality. By not having standard interfaces, they don’t easily integrate and create interoperability issues.
There are tools that claim to provide a full range of integrated functions, but that needs to be verified in many cases since interoperability could be limited to a single vendor’s equipment.
In short, AI Ops uses machine learning algorithms and analysis tools to provide immediate alerts of potential problems, often before IT staff notice them themselves.
How does it Work?
AIOps has been described as the nucleus of your digital operations environment.
It has five basic features:
-
Getting the correct data. This step is to identify the data you need and make sure it is available for analysis.
Data is pouring in from many different places, in a host of different formats, and often having conflicting or duplicated information. If not controlled, this can lead to duplicate and missed remedial efforts.
-
Identifying patterns. As an example, one of the key anti-malware strategies is to identify sudden and unexplained changes in network traffic patterns.
AIOps can sift through this morass of data. Algorithms can collate information from multiple sources, filter out noise and duplication, and pass on only necessary information.
-
Deductive reasoning. Having identified an anomaly, identify its root cause.
Subsequent analysis will then identify data patterns that signify the causes or results of unusual events.
-
Putting up a signal flare.
If AIOps finds a problem, it needs to communicate immediately with the appropriate staff members or collaborative teams. The problem may also need immediate automated implementation of corrective actions while sending SMS, Email, or Social Media alerts. There may be several levels of alerts.
This allows multi-disciplinary teams to quickly assemble and start work on resolving the issue.
-
Automation. A final step could be to automate some responses and corrective actions. This is already being done in Software-defined networking.
Automated processes can integrate with existing CRM and trouble ticketing systems.
Finally, the AIOps platform can store each incidents’ records, improving the identification, speed of response, and quality of future reactions.
To look at a specific current example, AIOps is already making inroads to Network Operation Centre (“NOC”) operating platforms. Software-defined network operations are already offerings from the primary network equipment suppliers such as Cisco and HP.
As networks have grown more complex, and access to remote and local digital has become more critical to the business and its users and customers, AIOps has become essential to ensuring network availability. That is becoming ever more true as teleworking becomes the new normal.
Because AIOps takes over a good part of routine work, network management staff can communicate and collaborate on keeping the NOC up and running. The tedious, labor-intensive work of monitoring and analyzing network logs is done for them.
In the larger, multi-site environments and in campus-based organizations, we are moving rapidly toward a virtual NOC concept. A virtual NOC running an AIOps platform provides significant improvements in flexibility and speed of response. For distributed teams, perhaps including members working remotely, it acts as a framework in which the IT Operations workflow can respond in a co-ordinated manner.