Learn to love your log files

How to stop worrying and turn your big, dumb pile of impenetrable event data into instant actionable intelligence

Considering how much information is available in log files, you'd think companies would pay more attention to them. Client computers, servers, firewalls, network devices, and other appliances generate reams of event logs every day, but these logs often go ignored.

Although it's a security sin, it's understandable on many levels. First, logs can contain vast amounts of uninteresting events. In fact, most logs are nothing but noise. With the rare exception, most logs are close to useless. At one current client, 1,000 computers and one perimeter firewall generate 25GB of log files on a daily basis. Out of that, in a typical week, not a single event is a true security issue requiring an immediate response. Oh, security events do happen, but when they do, they are normally buried in a sea of unimportant noise.

[ Honeypots are a sweet solution to the insider threat. Learn how to secure your systems with InfoWorld's Security Adviser newsletter. ]

Second, log file review is rarely a management priority -- until a tipping point event occurs or the auditors complain loud enough. Third, when the staff is already overworked, messing with something that provides so little real-time value seems wasteful. Lastly, few people get excited about reviewing log files. The answer to "Hey, Johnny, what do you want to be when you grow up?" is never Log File Reviewer.

So why care about log files? Because most malicious exploits and intrusions leave their fingerprints all over the log files. If the log file management system was crafted correctly, it could provide true real-time value. In this column, I'll attempt to give those of you hoping to improve your log file management system the CliffsNotes version of how to pull off a successful program.

Sawing logs

You can start by reading NIST's Special Publication 800-92, "Guide to Computer Security Log Management." Released in September 2006, it's unusually easy to read for a NIST publication and extremely useful for deploying event log management systems in the real world. It's considered the gospel for this small corner of the computer security world.

The NIST guide steps through all of the essentials of log file management: identifying the threats and risks to your environment; determining policies for logging, auditing, and handling logs; collating, indexing, and normalizing logs for analysis; defining and generating alerts and actions for critical events; and defining reports and metrics for management review. From putting log management infrastructure and processes into place to reviewing and archiving logs, it leaves no stone unturned.

One of the most important determiners of success is how much you can automate the process -- because what you don't automate you probably won't do. For example, it's critical to let an event management system (often described with acronyms of SIM, SEM, or SEIM) do all the hard work. It should be configured to collect, filter, and analyze the data, and to prioritize and generate alerts. You don't want to stare at reported events all day deciding which ones should be acted upon.

For every event record you collect, you need to determine its criticality. Most event log records are unimportant, meaning that they don't lead to an alert or action item. You must store them for some period of time in case they are needed for troubleshooting or forensics analysis after the fact. The big questions are where to store them and where to filter them?

One school of thought is that all events, regardless of criticality, should be sent to a centralized server. Then if they are ever needed, investigators need only look in one place to see all events. A central repository is a great idea in a perfect world. But considering that 1,000 computers and a firewall can generate 25GB of data each day, sending all of your logs to one place can have a huge impact on the network and centralized storage. Even a multiterabyte SAN will be able to store but so many days of data.

Another idea is to filter out unimportant events at the client and send only medium and critical events to the centralized server. If the unimportant events are needed, then they can be examined on the client in question or sucked into the centralized server whenever more cogent analysis is needed.

What determines a critical event?

Critical events should always lead to an immediate alert and a responsive investigation. If your configuration ends up generating so many critical events a day that responders can't keep up or action items queue up ignored, then you've defined your critical events too broadly.

An event record should be defined as critical when it indicates malicious activity. For example, suppose you rename your Administrator account and fine-tune your network so that nothing legitimate ever looks for or uses the log-on name Administrator. If a single log-on event detects the log-on name Administrator being used, you have an actionable event.

It's important not to write off all common events as noise. For example, in the typical network, log-on attempts will be among the most commonly recorded events. Even failed log-ons using incorrect names or passwords are normal. But don't filter them out. You need to collect and analyze them as they are often the first signs of an unauthorized intruder or malware worm going ape on your network.

For every recorded event ID, you need to establish a baseline measurement over three weeks to three months. Find out what is normal on a daily basis, by the hour, and over the long term. Then set triggers that generate critical alerts when these thresholds have been exceeded.

Continuing my log-on example, a few failed log-ons an hour can be expected, but if you were to see 50 failed log-ons in a second or 50 failed log-ons between client workstations that have no reason to communicate, then you have an event that needs investigating. Behavior thresholds are a little more art than science, but you have to start somewhere.

A log with no noise

Two other points: One, consider setting up one or more honeypot systems. I've covered honeypots more than a handful of times in this column. Take a computer you're getting ready to throw away and turn it into your secret honeypot. Don't even let most of the people in IT know about it (this helps catch more trusted insiders doing things they shouldn't). A honeypot is a fake asset, and nothing should ever touch it after you tune out the normal broadcast traffic. Its only job is to create an actionable alert if something tries to connect. A honeypot is low cost, low risk, low noise, and high value -- I wish I could say the same for other security software. My longtime favorite honeypot software product is KFSensor.

Lastly, consider outsourcing the whole thing if you don't have the time, equipment, expertise, or software. There are dozens of excellent companies that can take you from no log analysis to top-notch log analysis in a short time, but I can recommend no better company than my friend Bruce Schneier's BT Counterpane. There are other excellent competitors, but Counterpane is always on my short list of recommendations.

The idea is to create a nearly self-managing event log system, where only aberrant events get turned into action items to be investigated. Sure, plenty of those investigated items will turn out to be legitimate or technically misbehaving events, but you'll have another great tool in your arsenal next to your intrusion detection systems, firewalls, and antimalware software. And we need all the help we can get.

Copyright © 2009 IDG Communications, Inc.

8 pitfalls that undermine security program success