3 security analytics approaches that don’t work (but could) — Part 1

Bayesian networks, machine learning and rules-based systems individually don't work well. They don’t produce good results, don’t scale or are too hard to work with.

3 security analytics approaches that don’t work (but could) — Part 1

Digital technologies have changed the face of business and government, and they will continue to do so at an even faster pace. They drive innovation, boost productivity, improve communications and generate competitive advantage, among other benefits.

The dark side of this digital revolution has now come clearly into focus as well: McKinsey estimates that cyber attacks will cost the global economy $3 trillion in lost productivity and growth by 2020, while theft, sabotage and other damage inflicted by trusted insider personnel continue to cost organizations in lost revenues, revealed secrets and damaged reputations.

+ Read Part 2: How to combine security analytics approaches to create a solution that works +

We read in articles and white papers that "security analytics" is the solution, the Next Big Thing that will spare beleaguered organizations the reputational, financial and physical costs of all kinds of threats. But the reality is today’s security products are broken. The threats continue to morph and multiply. The attackers continue to outmaneuver or overwhelm the defenders. And the underlying doctrines and policies are far behind the times.

3 security analytics approaches that aren't as good as you think

Three widely deployed analytical approaches are often held up as shining exemplars in the brave new world of security analytics: Bayesian networks, machine learning and rules-based systems. Unfortunately, I see plenty of implementations where they simply don’t produce good results, don’t scale or are too hard to work with. Here’s a quick summary of the approaches.

1. Bayesian networks

Bayesian probability theory states that it’s possible to predict with surprising accuracy the likelihood of something happening (or not happening) in a transparent and analytically defensible way. A Bayesian inference network, or model, captures every element of a problem and calculates possible outcomes mathematically. The harder the problem, the better it works—at least in theory.

In reality, a typical approach is to gather a roomful of PhDs and spend a lot of time and money building a Bayesian network. Then, with even greater effort and more man-hours, the Bayesian network is turned into software by a roomful of coders. The resulting product is something the user struggles even to understand, let alone use.

Not surprisingly, there’s an emerging camp that claims Bayesian networks are old fashioned and not suited to solving today’s security challenges—especially now that machine learning is available.

2. Machine learning

In Arthur Samuel’s classic definition, machine learning “gives computers the ability to learn without being explicitly programmed.” It can, for example, be used to uncover hidden insights from historical relationships and trends found in data.

While that may have excited early adherents, we’ve had over 50 years to discover some of its limitations:

  • There are no real, generalizable approaches to machine learning.
  • Correlation isn’t all it’s cracked up to be in a world of black-swan scenarios and asymmetric threats.
  • Machine learning is dependent on data and thus is unable to offer solutions in cases where data is scarce or non-existent.
  • Most machine-learning solutions come "black boxed," and users who have to make and defend their critical decisions hate that.
  • Hasn’t science taught us to start with a hypothesis? There’s no such luxury with machine learning.

3. Rules-based systems

Rules-based systems use "if-then" rules to derive actions. For example, if the fact that "Sally is 22 and unemployed" is matched to the rule "If a person is between 18 and 65 and is unemployed, they can claim unemployment," the system would conclude that "Sally can claim unemployment."

While much simpler (and more common) than Bayesian networks and machine learning, rules-based systems nevertheless have their own inherent drawbacks. Because they’re typically binary, the outputs tend to be too coarse-grained for the often subtle threats they’re trying to detect and identify.

This leads to a proliferation of red flags (many of them false positives), which then leads to a proliferation of pricey analysts. Try instead to create rules for special cases, and you get a proliferation of rules. Paralysis reigns, and the world is still not safer.

A better way forward

Bayesian networks, machine learning and rules-based systems are applied successfully in many software systems across many domains. Google, for example, uses machine learning to recognize objects within an image and automatically create captions for them. So, clearly the techniques themselves are sound.

But they don’t typically work for security analytics, and that’s primarily because each technique’s weaknesses have yet to be resolved appropriately for that kind of application. Despite the limitations I describe above, each of these techniques offers unique strengths that would need to be present in an ideal security analytics solution:

  • Bayesian networks: Domain conceptual alignment and ability to reason on incomplete data
  • Machine learning: Sheer power and ability to cope with massive quantities of data
  • Rules-based systems: Intuitive simplicity and ease of getting started quickly

What’s needed is a solution that exploits the combined strengths of these approaches while also compensating for or eliminating their individual drawbacks. In part 2 of this series, I will outline how these systems could—and should—be thoughtfully built, combined and applied to serve those who defend our security.

This article is published as part of the IDG Contributor Network. Want to Join?

New! Download the State of Cybercrime 2017 report