3 security analytics approaches that don’t work (but could) — Part 2

Bayesian networks, machine learning and rules-based systems can be vital tools for organizations and security analysts if they are thoughtfully built, combined and applied

Three security analytics approaches that don’t work (but could) — Part 2
Thinkstock

A security analytics approach that exploits the unique strengths of Bayesian networks, machine learning and rules-based systems—while also compensating for or eliminating their individual weaknesses—leads to powerful solutions that are effective across a wide array of security missions.

Despite the drawbacks of security analytics approaches I described in part 1 of this series, it's possible to build such solutions today, giving users a way to rapidly identify their highest-priority security threats at very large scale without being deluged with false-positive alerts or being forced to hire an army of extra analysts.

The first step towards success is understanding the ultimate user. Solutions that provide software libraries or tool boxes for developers fail because they’re aimed at the wrong audience. Instead, the ideal security analytics solution must be built explicitly for (and with input from) end users, such as insider threat analysts in a security operations center (SOC) or those assessing highly cleared government personnel.

Next is following the proper development sequence. For any solution to be effective, it must begin with a thorough grasp of the problem the user is trying to solve, not by hunting for answers in masses of data. The best way to do that is to build a model of the problem domain, with direct input from the people who are deeply familiar with its dynamics and constituent elements.

With the right models, both direct rules and probabilistic inference can operate side by side, and they can be integrated to create easily traceable, transparent and generalizable facsimiles of any security problem. Machine learning and data science techniques can then be applied in simple and separable cases.

How to combine security analytics approaches to create a solution that works

With that as the backdrop, here is my take on how to combine the three analytics approaches in a way that does work.

Build your model first...

Arguably the most important model type in security analytics is the Bayesian inference network, since it is able to efficiently represent probabilistic knowledge about a complex problem domain (such as insider threats) and then use that knowledge to reason intelligently in the domain under conditions of uncertain, incomplete or even contradictory data.

There is a common misconception that building a Bayesian network is a time-consuming and expensive process when in fact it can be dramatically simplified by first identifying the important problem concepts, then specifying the qualitative relationships between the concepts and finally using readily available software to automatically assemble the qualitative knowledge into a quantitative Bayesian model.

This means an organization can now tap a diverse array of security experts—say, a specialist in threat intelligence or network access behavior or human resources issues or fraud indicators—and allow them to express their domain knowledge in a natural way that doesn’t require extensive interaction with Bayesian modeling experts. This approach succeeds because analysts can manipulate the model to suit their own reasoning patterns and lexicons, resulting in a comfortably familiar analytical environment.

Rules, such as those found in a rules-based system, can then be expressed as absolute indicators on model nodes. In this manner, the techniques of Bayesian networks and rules-based systems can be merged into a single coherent framework for expressing meaning from data. This approach is more effective than rules-based systems alone because rules-based inference can be intermingled with more nuanced probabilistic inference in the same model—sometimes even for the same concept.

...Then go get the data

The model is there to answer top-level questions, such as: 1) Is this manager or contractor trustworthy? 2) Is one of my critical assets at risk? or 3) Do this entity's actions pose a threat to my organization? The individual nodes of the model break each problem down into smaller and smaller concepts until they become causal indicators that are measurable in data.

Creating a model first thus provides a structure for identifying the data that will be applied to the model. This means that only after the models are built should users consider the specific data they might have on hand and how they can apply it to their particular security analytics problem. In this way they can select the data they need (and pinpoint critical data gaps) while ignoring the data that’s unlikely to be useful.

Data comes in many formats, and at different velocities and volumes. Some data, such as home address, salary or other information about an individual, changes rarely and can be used directly in the model. Other data, such as network user activity, changes continuously. Given these variations, data-source mappings can be direct or complex. For the latter, machine-learning and other data-science techniques can be used to capture specific indicators or to extract indicators based on learned features in data.

An approach that embraces the model-first ethos can lead to the development of a powerful security analytics solution that reasons like a team of expert analysts. With a well-designed software architecture and platform in place to orchestrate all of these actions, the system will operate transparently, consistently and reliably. And it will do so at a tempo and scale that no human analyst, or even a roomful of them, could long endure.

The net effect is that analysts can focus on higher-order threats, and organizations can more effectively anticipate their most complex security challenges.

This article is published as part of the IDG Contributor Network. Want to Join?

New! Download the State of Cybercrime 2017 report