Cybersecurity artificial intelligence hype is real

While artificial intelligence sounds like it will be the superhero of our future, it’s still dependent on humans to determine the good from the bad. In essence, another detection-based solution that will always be challenged by the obfuscation techniques of rogue nation states and well-funded criminal organizations.

Robotic AI hands typing on computer keyboard

Artificial intelligence (AI) and its sidekick, machine learning (ML), are depicted as comic book superhero characters in cybersecurity. Unfortunately, not everyone realizes that the ripped muscles beneath the spandex are special effects. Machine learning, the sidekick, is having far more impact on cybersecurity today than AI. AI and ML are not synonyms; Batman is not Robin.

It’s not that ML and AI aren’t doing meaningful things today. They are, however, the hype from the vendors with the super-sized marketing budgets and their co-dependents in the trade rags (hype raises ratings) are making it difficult for non-experts with real-world problems to decide what to spend and what to expect.

Machine learning depends on “likeness” between the environment that created the model and that of the real world where it tries to tell good from bad or normal from abnormal. The greater the “unlikeness” between the environments of where the ML model was developed and where it’s deployed, the greater the prediction errors and false positives. Even if the model were trained and deployed in the same place, the second “likeness” challenge means that privacy and other concerns might exclude informative data, resulting in "unlikeness." And time accounts for the third “likeness” challenge, which collides with an inescapable truth that with time comes change. This not only means that our ML models may not be ready for what may come but also that as time goes by, the cumulative differences between past and present degrades ML over time.

Cybersecurity vendors undoubtedly considered “likeness” challenges when they first started applying ML. They started with a relatively simple problem with little dependence on deployment environment: telling good from bad (malware) files.

So, how well has machine learning detected malware?

In Gartner’s report, “The Evolving Effectiveness of Endpoint Protection Solutions,” it states that they “ not show an effectiveness consistently above EPP vendors nor the ability to maintain that in the long term.” ML fares little better than Windows Defender (free). In another Gartner report, “Comparing Endpoint Technologies for Malware Protection,” they observed that other types of security controls are supplementing ML to protect endpoints, which “…can have a high FPR [false positive rate]. Models have to be tight enough to detect most malware, but loose enough to avoid false positives and operational inefficiencies.” As for effectiveness, “... works well for executables... but less so for weaponized PDFs and other office files.” They also go on to say that ML tech is just as susceptible to obfuscation tactics (encrypted files & transmissions) as traditional tech. So, ML features no x-ray vision or telepathy.

Another similarity with traditional AV, researchers showed several ML enhanced tools were susceptible to sample mutations whereby simple tools alter the characteristics of known malware to evade detection. Yet worse, they observed that several ML tools were not learning from test activities from the previous month. While we may be amused that ML tools did not learn, we should not be shocked. The vendors must in some way “supervise” what their ML models learn. If not, then the “unlikeness” of field data might devastate the model. We can expect improvement. For now, don’t expect machine learning in the field to exponentially accelerate detection of new or mutated malware.

With ML alone not magically improving malware detection and the products requiring additional security controls to attain greater endpoint protection and rapid reaction to incidents, vendors are under pressure to more aggressively challenge the “likeness” dilemma. To squeeze out more detection, they are increasingly factoring in more “unlikeness” for each deployment. Anecdotes talk of deployments requiring rarified expertise to fine-tune and maintain, myriads of environment anomalies (heterogeneous endpoint and application configurations) that must be smoothed out, and some deployment phases lasting 18 months. And, the “likeness” dilemma dictates that subsequent changes to the environment require re-tuning the ML model. Your average Windows administrator cannot do such a thing.

So what is an enterprise with cyber problems to do regarding machine learning-enhanced cyber tools?

First, continue to build on your understanding of the ML “likeness” challenges. Second, seek out first-hand accounts of those that actually deployed them. Third, test the assertions of the vendors in your own environment. If doing so alone is too daunting, team up with one or more organizations where one or more VLANs of test endpoints represent yours and the others represent theirs. Yes, this might invoke the “likeness” dilemma.

Lastly, keep an eye on the adversary. Data gurus say ML has blind spots and one researcher showed how he identified and exploited them to evade detection. Analysts say that once adoption of ML enhanced cyber tools reaches 10 percent, adversaries will undoubtedly routinely outsmart the ML models in the wild. Stay tuned!

Copyright © 2017 IDG Communications, Inc.

22 cybersecurity myths organizations need to stop believing in 2022