
Managed Server Providers (‘MSP’) and in-house IT departments see Server Monitoring as a key function to be carried out continuously. Many already have monitoring programmes in place to maintain service levels and keep an eye out for hardware or software issues.
Clearly, aiming for best practice will help achieve optimal service provision, and here are five best practice recommendations for Server Monitoring, but before getting into detail, some practical advice.
- Don’t reinvent the wheel, otherwise known as “not invented here”. Use proprietary software that comes with the server OS, or a package like Solar Winds. You wouldn’t write your own Asana or Wrike to make a collaborative PM tool, would you?
- A picture is worth a thousand words. Visual graphical representations of data are understood much more quickly than lines of numbers scrolling up a screen. Think of your car dashboard. Quickly assimilated, but not a lot of writing there.
- No news is good news is a dangerous assumption. Short periods of silence from your monitoring system are possible but not long-term ones. Be alert for no alerts signifying possible failures in the monitoring system.
- People forget or are absent. Automate alerts where possible. For whatever reason, people don’t always keep to schedules and protocols.
- Finally, on that point, use more than just email to send out alerts. However, you must remember to keep a high signal to noise ratio. Staff will quickly ignore SMS and email alerts if the message is a routine all-is-well message, not one telling them of a critical problem.
Five recommendations
Set a Baseline
A baseline is essential as a starting point for server monitoring. Before you can identify abnormalities you need to know what is normal. You also need to be able to factor in business circumstances that might cause a change in normal patterns of activity. Seasonal fluctuations need to be recognised. Marketing events can cause changes.
As an example, a sudden increase in network activity might be increased website visits following a sales promotion or Black Friday, rather than early indications of a DDoS attack.
Baselines need to be revised after hardware or software changes.
A second benefit of baseline planning is of providing an early indication of servers reaching their limits, thereby helping upgrade planning and developing growth scenarios.
Server Core Usage
Most monitoring systems provide details of CPU and memory usage, storage capacity and network bandwidth. It is essential to have a minute by minute dashboard showing these key indicators. That will provide early warning of any problems, for example, resource-hogging applications, memory leaks and network instability.
It will also show any systems and intersystem issues where for example, one system waits for another to release a database record.
This is where a graphical presentation is essential.
Policies and Procedures
It’s all very well to monitor core usage, but what do you do if you notice a potential problem? The monitoring software will highlight it and perhaps create an alert, but an escalation procedure is needed to set out what to do about it. The escalation procedure will probably have three levels, interesting, fix-it-now and critical. Procedures need to be in place for each class.
Prudent organisations have an escalation matrix defining what is to happen in the case of problems, who to contact, immediate remedial action for example. It is often developed in a workshop involving the IT staff, external suppliers and any contractors. The group brainstorm likely scenarios and set out what needs to be done. How to recover from a ransomware attack for example.
Failover Planning for High Availability
Most IT systems need to run as close to 100% availability as possible. However, that is not always possible, but server monitoring can go a long way to making systems high availability systems.
With the best will in the world, there will be a systems failure and the consequent downtime from time to time. Unfortunately, it often means that your monitoring and analysis tools are not available. You need a high availability strategy for your monitoring tools.
The current vogue is for an effective server monitoring strategy, with infrastructure configured to ensure that monitoring and measurement systems are still available when the major systems are experiencing downtime.
Configuration Management
Recent developments have brought profile-based configuration techniques to the fore. Systems have individual role assignments, but also have properties shared with other systems.
The technique is to create role-based authentication profiles and assign them to individual systems. When you need to make a change, just change the profile, and the changes are automatically blown through to all affected systems. This reduces manual intervention, applies changes simply and consistently, and best of all, immediately.
Systems monitoring was often been thought of as a Cinderella role in the IT world. However, while not glamorous, it is absolutely vital to ensuring the successful operation of the IT site.