Whether in an organisation or providing hosted services, any IT department needs to monitor server performance. Users generally rate service levels on such subjective things as online response times, which critically depend on server performance.
Servers can and do go down, either through human error, mechanical failure, or other external reasons. In all businesses, downtime will cost money, and result in a poor reputation among customers.
It is particularly true in a shared hosting environment, where several users share server resources. Overuse of resources or incorrect virtual machine configurations can cause downtime or poor performance.
Server Monitoring is essential to protect your business from downtime, losing money and reputational damage.
To quantify, it is reported that Amazon went down for around half-an-hour in 2013. They calculate that outage cost them at least $2 Million in lost sales.
The first step is to define service monitoring.
Techopedia defines it as:
“the process of reviewingand analysing a server for availability, operations, performance, security and other operations-related processes”.
Monitoring is used as a preventative measure to identify any issues, including malware attacks, that might affect service delivery.
Service delivery is at the core of any IT service. Users expect to access systems and data quickly and have a reasonable response time to their actions. If the business is a hosted service business, and in many large organisations, the parameters associated with service delivery are set out in a Service Level Agreement (“SLA”).
Not meeting SLA targets will result in financial penalties and disgruntled users. Server monitoring will give early warning of any potential issues adversely affecting service levels and protect the business.
What to Monitor
Like any computer, a server has three major components: CPU, Memory and storage.
The critical measurement for the CPU is CPU utilisation. While you might think that keeping utilisation as near to 100% as possible is ideal, the answer is no. A CPU must handle both system and application service requests, and if utilisation is 100%, new requests will wait for current requests to complete and reduce service levels. In the extreme, it can lead to system failures if the queue gets too long.
The standard measure is 75% or below, to be able to handle sudden spikes in demand.
Having enough RAM is critical to server performance. In low RAM environments, currently idle tasks in RAM are swapped out onto disk storage (“swap space”). When the task resumes, it must be copied back into RAM. That takes time and processing cycles, again reducing performance and service levels.
In extreme cases, some processes will not start if there is insufficient free RAM.
Monitoring RAM availability will highlight consistently low levels of free Ram, indicating a need to add more and improve server performance.
Storage is a critical server resource. It holds systems and applications programs, and permanent and temporary systems and user data. Unless they are SSD disks, they are also one of the few server components with moving parts. They are, therefore, more prone to failure.
Thee key factors in disk storage, are rotation speed, transfer rate and available space.
- Non-SSD disks are rotating platters over which a read-write assembly floats. The correct bit of the platter needs to be under the read/write heads to read or write data. The faster the rotation speed, the better. Failing disk units often show a decline in speed.
- The transfer rate is how quickly data can be transferred to and from the disk. The faster the better.
- The CPU needs spare space on storage for swap space. Insufficient space and the system might halt.
Monitoring all disks for these critical parameters will help to avoid problems.
Servers with specific functions need other monitoring functions. The focus of monitoring will depend on the server type, including:
- Server availability for applications servers;
- Security, availability, and performance
- Capacity and transfer speed for storage servers;
- Data loss, utilisation and malware
- Security, response time and availability for web servers.
- Security, response times, malware, traffic load.
In summary, server monitoring’s primary goal is to maximise service levels and anticipate potential server failures.
Monitoring can be manual, using tools provided with the server operating systems. However, because of the importance and necessity of comprehensive server monitoring in data centres, it is nowadays more common to use specific server monitoring software tools.
The enormous volumes of data generated in the monitoring process support the need to use automated monitoring tools.
Some organisations have more than one tool, for example, a tool to measure server performance and a specific network monitoring tool.
Server monitoring is an essential component of business protection, particularly in the hosting environment. If you are concerned about cost, not having the data to analyse following a failure can be a lot more expensive.