Linux has a deservedly good reputation for reliability, but with a little effort we can be aware of – and resolve – issues that are bubbling under the surface and which may, in time, impact your business.
Rich in the detail of what’s happening on your Linux system are the log files that Linux maintains as an integral part of the system. Examining these often reveals areas that would benefit from some attention – but there’s a problem.
What is Logged?
The short answer is, “a lot”. The logging system is very flexible, and may be configured to log pretty much anything. Some typical examples:
- a user logs in
- an email is received
- the internal clock is adjusted by 27 milliseconds
- an application creates a new customer record
- a disk experiences an error reading some data first try, but succeeds on a second try
Non Problems
For the most part, the examples given above are of little interest. However, when a user reports that they’ve not received an expected email, the logs allow the system administrator to check whether that mail has been received, and whether there were any problems with it.
An example might be an incoming email in which the recipient’s address is mistyped. The user reports that an expected email hasn’t been received, and a check of the log files shows that a mail was received for ‘jo.smiht@example.com’. In days past, that mail might have been returned to the sender with some kind of “no such recipient” message, but – owing to the amount of spam around – today many mail systems don’t return such messages, but rather just delete them.
That’s an example of where the logs can shed light upon a problem reported by a user. This isn’t something we need to proactively look for: there will be a large number of emails that are not delivered for a variety of reasons, and on the odd occasion that a user reports such a problem, we can check the logs to see what
happened.
The Problem
Occasionally, there will be events logged that should be acted upon. Maybe a disk is reporting errors, or perhaps there are repeated attempts to log into a non-existent user account. It would be nice to know about those log entries, but the challenge is in finding the messages that are significant to your environment amongst the thousands of benign messages logged every day. Searching the logs manually is both time-consuming and inefficient.
The Wrong Solution
One approach is to define what is being sought, and have a report sent each time a match is found. The challenge, though, is defining what to look for. Searching for “error” in the logs might highlight some interesting entries, but it won’t find a line reporting “Unknown user: fredbloggs”.
It will also have lots of “false positives”. Our email problem above, where a mail is received for an “unknown” user, is just one example. It doesn’t take many such log entries for the report to be largely meaningless and therefore largely ignored. One of the key roles of system management is not to receive reports that are routinely ignored.
How To Check Logfiles
A better approach is to do the opposite: define what we don’t want know about, and then report on everything else. All of the same information is logged and available if required, but the report that is generated contains only what is left after the benign messages have been filtered out.
The aim here is to only ever receive reports that will be acted upon: if something is reported that does not require action, that “something” should be added to the filters so it is no longer reported. Initially there’s likely to be a lot of benign data reported, but over time such data can be filtered out, and the reports become significantly more valuable.
The end result should be a small number of short reports detailing the log entries that didn’t match the “expected” ones, and which require action. It is that action that increases the security or availability or performance of your server.
What software to use
Logcheck is a good implementation of the method described above, and is readily available for Debian (and derived) distributions.
Do not confuse logcheck with the similarly-named Logwatch. Logwatch mails a daily summary of log activity, but – whilst occasionally interesting – it fails the acid test of an automated email, which is “what action must I take from this mail?” The answer, for most logwatch mails, is “nothing”, whereas the logcheck mails are either alerting you to a problem or they are indicating that the logwatch filters should be refined.
Why You Should Do This
Log file reporting will provide valuable warnings when things are not as they should be. When implemented as described above, the amount of “noise” should be minimal, and thus you are only alerted to things that you can and should do something about.
Could This Article Be Improved?
Let us know in the comments below.