How to automate threat hunting

The quest for hidden threats...

alert hacking threat detected

The SOC teams responsible for detecting and stopping breaches are famously short-staffed. That’s troubling, because data breaches were up dramatically in 2017

With each passing year, the threat detection problem seems to go unsolved, if not getting worse.

This is fundamentally caused by the fact that SOC teams have much more data than they can handle or know what to do with. This happens for a few reasons:

  1. SOC teams don’t have expertise to build high quality rules
  2. “Out of the box” rules don’t work; they end up generating too many false positives
  3. They don’t have the bandwidth to go threat hunting

Gartner’s Anton Chuvakin believes that only 0.1 percent of organizations will have the capabilities to be successful at threat hunting on their own.

If SOC teams don’t have the manpower to proactively hunt down new threats, what if we could teach machines to hunt? Use human analysts to train and tune the automation, to scale their expertise and get the job done.

But what exactly is the job that needs to be done? What are we really automating when we automate threat hunting?

Threat hunting is proactively searching for indications of any IT security threat or compromise.  It is about filling in the gaps in the SIEM rule set. It’s accounting for the false negatives—the lack of alerts in situations when alerts really should be raised—in rule-based security systems.

Threat hunting is successful when SOCs are able to detect the vast majority of threats in their data, in a very timely fashion.

Threat hunting is a classification problem

In their sleuthing for threats, SOC teams aren’t starting from scratch. They have lots of data to work with, including log files, SIEM rules and alerts, and other data on the security status of their IT infrastructure.

To find new threats, SOC teams need to analyze security events, billions of which occur every day in a large enterprise.

To analyze an event, they need to consider all associated attributes or factors. It’s these factors that make an event interesting or not from the IT security point of view.

For example, one factor might be that an event occurred at a certain time. Another factor might be that the event involved a certain device. Another factor might be that it involved network traffic from a specific user, and so on. Some factors might be explicit; others might be latent, recognizable only when correlated with other factors.

Factors differentiate one event from another. Identifying these factors and labeling them is sometimes known as enriching data. The more factors you have and the richer your data set, the better the quality of your data is for threat hunting.

When examining factors, security analysts can’t afford to rely on sampling, which produces errors about 3 percent of the time. Breaches occur in 0.0003 percent of security events, so sampling with an error rate of 3 percent will miss too many indications of possible breaches.

Instead of sampling security, threat hunting relies on scoring. Security analysts assign each factor a numerical score that estimates the factor’s importance.

For example, let’s say a file is created on a server. The name of the file might be quite conventional, earning that file-name factor a rating of 1. The fact that the file was created by a superuser might earn the file-creator factor a rating of 10. Because the file was created at 3 am on a Saturday morning when the data center was empty, analysts might decide to assign the factor of the time of file creation a rating of 10, as well.

It’s through this detailed scoring of factors associated with events that threat hunters can identify anomalous events that bear closer inspection. Scores of factors are combined, and the high-scoring outliers clearly stand out, signaling the possibility of a threat.

Challenges with classifying events

That’s how factor scoring works in theory. In practice, classifying event data is time-consuming and difficult. There are several reasons why analysts encounter challenges with classifying events.

  • Managing the scope and variety of events
    A single application such as NetSuite or Oracle might have thousands of associated events and each of those events many have many features. Those features might change with each new software release or added feature. The sheer scope and variety of the billions of events that occur each day in a modern enterprise makes classifying security events a daunting task.
  • Identifying relevant factors
    Not all factors are equally relevant for the purposes of threat detection. Organizations need to ensure their analysis is focusing on the factors that matter. In practice, this requires a knowledge of the organization itself, its normal operations and the usage patterns of its IT resources.
  • Determining how to enrich data
    The value of enriching data to make it more useful for predictive analytics is important. Knowing which data to combine, correct or extend is critical, but also very difficult. It often requires human intervention and can be time-consuming work.

How software automation can help

Every skilled security analyst whose work includes threat hunting follows a predictable process. They search for high-scoring anomalous events, and if they find any, they examine them closely to determine if they indicate the presence of a genuine threat.

Software can apply these same strategies and tactics. You can train software to look for anomalies in events, and to examine them for indications of threats. What’s more, through automation, you can scale up this analysis so that a vast number of events are examined in a fraction of the time required by humans. If the automated threat hunting is based on machine learning, you can train the software to accept and apply corrections and refinements from security analysts, so that the automated threat hunting becomes increasingly fast and accurate over time.

Automating threat hunting addresses the three challenges listed above.

  • Automate event analysis
    Automated analysis can dramatically broaden the scope of events being examined. Good automation platforms can rapidly analyze millions or billions of events.
  • Automate factor identification
    Machine learning can identify factors that bear analysis either through explicit labeling provided by an analyst or through discovery performed by the software itself.
  • Automate data enrichment
    Software can help with enriching data by automatically clustering similar events together and performing root-cause analysis.

New security automation solutions can apply this approach to threat hunting, ranking events by order of severity, raising alerts to any suspicious events that the software has deemed suspicious or an outright threat, and enabling SOC teams to focus immediately on the most critical threats before they can inflict damage.

Automating threat hunting does not eliminate the need for human security analysts. On the contrary, human analysts will always be needed to provide the most comprehensive and grounded understanding of context and priority and to make critical decisions in ambiguous circumstances. What automation does do is free analysts to focus on the most critical events more quickly than ever before. It enables confidence that while they focus on specific events, threat hunting is continuously occurring on a vast, automated scale in the background to help keep the enterprise safe.

This article is published as part of the IDG Contributor Network. Want to Join?

NEW! Download the Fall 2018 issue of Security Smart