Effective system monitoring, with automated alarm notifications to support staff, can preempt many problems and improve system availability. Such monitoring will typically include:
- Free disk space available
- CPU utilisation
- Memory utilisation
- RAID health
- Checking for disk errors
- Checking system logs for potential problems
- Ensuring essential services are running
- The status of the backups
- Whether security updates need to be installed
- Routine system security checks
- Checking the validity and lifetime of any SSL certificates
Business Process Monitoring
Extending the monitoring to cover key business processes is a relatively easy extension. An online shop, for example, may know that it takes around 10 orders an hour. One check may be to look at the time of the last update of the “sales” table in the database and if it was more than 10 minutes ago, it could raise an alarm.
This is by no means the only test that should be run on such a server, and it wouldn’t be very helpful in diagnosing the cause of the problem, but it would alert staff who could check that all is well. If an issue is found, it may be appropriate to add more specific checks that would alert staff of future similar issues.
IT is there to support the business, and monitoring its effectiveness in doing so is most certainly a worthwhile approach.
Trend Monitoring
While status monitoring is helpful in identifying problems when they begin to make themselves apparent, system trend monitoring is concerned with looking at various system parameters over a longer period of time. Typically, many of the same parameters are measured, but are displayed as graphs. This can be useful in a number of ways:
- It allows reasonable predictions to be made, for example in disk space or memory requirements.
- The cause of transient issues, such as slow performance at a given time, can be narrowed down.
Predictions
By way of example, the graph below shows the disk usage of a system over time. It can be seen that the /home
partition (the top line) was filling up between June and early October. The system status monitor alerted support staff to the fact that the disk was getting full, and the graph below enabled a judgement to be made, that, unless something was done, the system would run out of space around the beginning of November.
In this particular case, some files that were no longer required were deleted, shown by the drop in mid-October, but it would have been possible to schedule the fitting of an additional or larger disk if that had been appropriate.
The use of system status monitoring and trend monitoring allowed the problem to be resolved before it impacted the business.
Log Monitoring
Monitoring Linux logs for potential problems is a critical part of ensuring high availability, and helps us be aware of problems before they impact your business. The System Logger is an integral part of Linux and, as its name implies, it keeps a record of things that happen on your server.
What’s Logged?
The logging system is very flexible, and may be configured to log pretty much anything. Some typical examples:
- a user logs in
- an email is received
- the internal clock is adjusted by 27 milliseconds
- an application creates a new customer record.
Mostly Harmless…
For the most part, the examples given above are of little interest. However, when a user reports that they’ve not received an expected email, the logs allow the system administrator to check whether that mail has been received by the system, and whether there were any problems with it (perhaps it was rejected because the recipient’s address was mistyped).
…But Not Always
Occasionally, there will be events logged that should be acted upon. Maybe a disk is reporting errors, or perhaps there are repeated attempts to log into a non-existent user account.
Bad: Small Needles, Big Haystacks
The difficulty is in finding the messages that are significant to your environment amongst the thousands of benign messages logged every day – certainly searching the logs manually is both time-consuming and inefficient.
Better: What Are You Looking For?
A better approach is to define what is being sought, and have a report sent each time a match is found. The challenge, though, is defining what to look for. Searching for “error” in the logs might highlight some interesting entries, but it won’t find a line reporting “Unknown user: fredbloggs”.
Best: What Aren’t You Looking For?
Better still is to define what we don’t want to know about, and then report on everything else. This approach sends emails to the system administrator detailing everything in the logs that the system has not been told to ignore. As you might expect, initially that can be quite a lot of data, but over time we can filter out the benign messages.
The aim here is to only ever receive reports that will be acted upon: if something is reported that does not require action, that “something” should be added to the filters so it is no longer reported.
What We Get
The end result will typically be a small number of short reports detailing the log entries that didn’t match the “expected” ones, and which require action. It is that action that increases the security or availability or performance of your server – and in today’s business environment, that’s essential.
We have been using Tiger Computing for more than 10 years and they have provided us a server with 100% up time and off site backups. Their technical support has been second to none. I highly recommended them to anyone who needs trouble free IT.
– MIKE VINCE, MANAGING DIRECTOR (MONODE)We Can Monitor Your Linux Systems, Too
See how our Linux System Monitoring services can benefit your business.