In both the in-house and managed service provider (“MSP”) environments, managing and monitoring performance is a key task. Every IT department should have a performance monitoring strategy. In a typical IT environment, the service that a customer receives is dependent on:
Server performance, measured by a server monitor;
The overall server performance is defined by the performance of various elements:
Processor utilization should be high, but with sufficient headroom to accommodate any sudden peak demands.
Network performance and the network should have sufficient bandwidth to accommodate broadband response. Routing algorithms need to be flexible, and ideally self-healing and self-configuration to maximise throughput.
Storage performance matters. The question of HDD versus SSD comes to the fore. SSD has no latency and much faster transfer times but is significantly more expensive for high storage volumes.
For a cloud-based service, the performance of the Internet link.
The speed of the connection to the cloud base is generally out of the hands of the IT department. The speed of the link will be the speed of the slowest link in the chain and may be subject to random outages from time to time.
The first step is to decide what you need to measure, and acceptable performance levels for each. It is also useful to ask why you need server monitoring. Is it to justify a systems upgrade, or to manage customer service levels against an SLA in a hosted server environment.
A typical performance monitoring strategy has 6 components:
The old management dictum applies here – “If you can’t measure it, you can’t manage it”. A current corollary Is to make sure that there are no visibility gaps in your collection processes.
A key issue is to make sure that your server monitoring platform is properly scalable. Your data collection and reporting needs will change over time and probably increase.
Data collection needs to be granular at a sufficient level of detail for each metric to allow analysis to the polling interval level. This should in most cases be per second.
The existing environment coupled with the Cloud and IoT environments generates enormous amounts of data. Simply put, that means enough storage to hold the data for one or preferably more polling intervals.
Creating a baseline
When the data collection set is available at the level of granularity required, the next step is to establish a baseline for each server monitoring metric.
Unless a baseline “normal” level of service has been defined, it is impossible to set alerts of unusual server or network performance.
Defining Performance Alerts
Alerts are usually basic on breaches of two criteria – a static threshold, and baseline deviation. Beware though these need frequent review, particularly in the early stages of performance monitoring. Unusual, but normal workloads might generate false positive alerts.
When sufficient historical performance statistics are available for all metrics, they can be used to provide better predictions of service-impacting events and fewer false positives.
The term report is often misunderstood as referring to the periodic printed statistical reports for each metric. In a real-time monitoring environment, the on-screen visual reporting is far more relevant to immediate needs.
Most server monitoring tools have a suite of canned visual and printed reports giving the performance statistics needed for basic server monitoring, However, they usually need to be supplemented by additional reports, online and paper, that help with problem resolution.
Visual monitors must be as near real-time as possible, and how the data is stored in the monitoring tool can have a great difference as monitoring workloads grow.
Monitoring tools that store all data in a monolithic single database will show performance degradation in reporting speed as the database grows in size. Either period archival or moving to a distributed database architecture will address the performance issue.
The objectives in data analysis are two-fold, predicting service-impacting events, and trouble-shooting them when they occur. A further benefit will be to assess the performance of the overall infrastructure, identifying bottlenecks, and where configuration changes might be of benefit.
Some tools allow modelling of infrastructure changes, leading to more informed strategic decisions about new and e replacement infrastructure. This is significantly eased by having all the metric performance data in one place.
When the server monitoring environment is established, sharing the results of monitoring will greatly assist IT Team members and others with carrying out their functions. As an example, high-level metrics will help the CTO assess service level performance.
It is also possible with a stable monitoring platform to integrate metric data with other platforms, such as CRM or Fault management.
In summary, looking at server monitoring means looking at the core requirements of your monitoring strategy, defining the key metrics and their trigger levels.