Navigating the AI hype in security: 3 dos and 2 don'ts

“Very few things that advertise AI have the goods under the hood. I think what people are touting as innovative AI is still very basic, and we can go a lot further.” — Rick Grinnell, founder and managing partner, Glasswing Ventures

artificial intelligence / machine learning / robot reading stack of books
Thinkstock

I’ve been needling the artificial intelligence (AI) hype bubble since 2015 when, after managing a CalTech research grant, I saw a massive discrepancy between the attitudes of buzzword merchants and data scientists. Later, in 2017, I was the lone dissenting source in a FoxNews piece overhyping the use of AI to solve fake news.

While I hate the hype, I love what’s coming with AI.

Early stage investor Rick Grinnell articulates a pragmatic assessment of the hype around AI and how it's really being used now.  “From a real product technology standpoint, we’re still in the first inning of the game. Very few things that advertise AI have the goods under the hood," says Grinnell. "I think what people are touting as innovative AI is still very basic, and we can go a lot further.” Rick should know. He’s the Founder and Managing Partner at Glasswing Ventures, which has focused on AI enabled security companies for a number of years.

How does one navigate the hype and decide when to invest in AI and machine learning (ML)? Here are some dos and don’ts:

Don’t exhaust yourself chasing exotic math

The major misconception behind the hype is the belief that a sudden influx of advanced math has caused machines to think like humans.

During my initial AI/ML project in 2015, I remember being excited to see the new math behind the AI boom. I was quite surprised to see some of the world’s most talented data scientists, fresh off their work at CERN, using machine learning algorithms like “k-means” and “DBSCAN” — algorithms right out of my 20-year-old textbooks! As cybersecurity’s first wave of machine learning toolkits were built on data lakes, I eagerly peeked inside their ML libraries. No new math there either.

Cylance popularized AI-based file heuristics. It’s probably the most successful large-scale implementation of machine learning in cybersecurity, and they’re pretty open about how they did it. Pragmatic usage of neural networks, deep learning, and solid engineering appears to be their recipe, not exotic and cutting-edge math.

At the risk of sounding too snarky, when it comes to data science in cybersecurity, you’re likely to be using math older than the C++ programming language. So don’t feel like you have to spend all your loot on expensive data scientists from MIT and the NSA. Or that you have to buy only from vendors who tout exotic algorithms.

Do get excited about machine vision and natural language processing

Where you will find new math is in the specific areas of machine vision and natural language processing (NLP). Both will have immense impact on cybersecurity down the road.

Cars now see the road, and software can recognize faces and objects. Machine vision has everything to do with security. Authentication and firewalls are useless in a world where IoT and mobile devices have cameras and microphones. If devices are in the physical vicinity, they can see, hear, and collect sensitive data.

Cybersecurity will inevitably merge with physical security in the coming years. Endpoint telemetry, logs and network data will likely be supplemented by security cameras and device webcams to detect interloping people and devices. As a side note, the surrounding privacy debate will be huge.

Security is finally becoming data oriented with the advent of GDPR and data lifecycle management. Now the industry needs to confront the magnitude of the data we secure. It took all the people in your company — and likely many more than that — to create all of your organization’s data. No small team of humans will make sense of this much data, understand its levels of privilege, and its granular priorities to defend.

Fortunately, AI is bringing the basic meaning behind language within the grasp of software, using things like NLP. ML approaches to modeling unstructured content will also be an important investment and have been used in eDiscovery for almost a decade.

Don’t underestimate data’s role in driving the AI revolution

The AI/ML deployed in cybersecurity today has been used for 30 years in anti-fraud and military applications and for a couple of decades by search engines like Google. If new math isn’t driving the AI buzz, what is?

The long migration to the cloud brought reduced costs of storage. It brought the processing power to pull off data curation, training, and executing algorithms at scale. Further, the rise of SaaS and managed services consolidated the data of many client companies into a single provider’s datastore. These data-rich providers were then incentivized to deliver new insights with AI/ML.

“Its only in the last ten years, or even five years, that you have a lot [of] data with high quality, and we have the computational power to run complex algorithms on it. It’s only now that you can take these old algorithms that, by the way, work well and apply them at scale,” says Alon Kaufman, co-founder and CEO of Duality Technologies, which offers privacy-preserving analytics and AI on data while it’s encrypted.

And data is a unique advantage that security professionals have over the black hats. “In using machine learning and artificial intelligence and data to do better detection, it’s asymmetric in that it benefits the people with the best data signals. It’s asymmetric relative [to] the adversary,” says Microsoft CISO Bret Arsenault.

Over the past 4 or 5 years, instead of pushing vendors for their data scientists' credentials and insight into their algorithms, we should have been asking about their data.  Does a vendor’s anomaly detection learn your environment, or is it trained on their vendor data? If it's trained on vendor data, you might want to dig into those data sets and see if they’re representative of your network.

It’s easy to acquire malware file samples to train next-gen antivirus against, but how does a startup build a training set of ephemeral execution behaviors? You should ask.

Do things the easy and efficient way, even if it’s not buzzword compliant

AI/ML is not a sure thing. Traditional software development involved domain experts writing requirements and programmers implementing domain logic to solve problems. At the beginning of a traditional software project, it wasn’t difficult to estimate the likelihood of producing software that does something useful, even if buggy or lacking in elegance.

Data science is entirely different. Even if you have top talent and do everything right, it’s harder to predict the degree of success. The data may not consolidate into any pattern that can be modeled. It may predict something wrong because correlation is not causation. Or it may tell you something you already know to be true. "You may find yourself doing an amazing data science project crunching numbers away just to come up with a trivial, and irrelevant, result like 49.5% of the people in the world are men," says Kaufman.

Unfortunately, a common vendor strategy these past years has been to take what was already done with a rule or a basic heuristic and convert it to ML to achieve buzzword compliance. They likely spent more to deliver the same thing.

“If you know how the thing works, you don’t have to do machine learning – you can simply go program it. If you don’t know, that’s exactly where you need machine learning, which is a very clever way of doing an effective heuristic,” says Kaufman.

Do find data scientists without killing yourself

They say cybersecurity has zero unemployment. It’s hard enough to find a security analyst, let alone a PhD data scientist with cybersecurity experience. “I think every company at this point is looking for that data scientist or team of data scientists because they think they have to. It’s a corporate arms race,” says Grinnell.

Fortunately the industry is working on tools to visualize and model data, bringing AI/ML in reach of a broader audience of data engineers and “citizen data scientists.” (Full disclosure, my employer OpenText makes such an ML  and predictive analytics platform.)

In addition to using such tools, the security industry should tap into academia more often. Academic researchers are surprisingly affordable. They already have backup jobs and salary, so it’s easier to hire them part-time than it is to hire part-time consultants. While you might only remember professors giving you a hard time for late homework, academics are actually quite easy to work with. They do after all, spend a portion of their time explaining technology to students who are just out of high school.

Conclusion

We've all seen vendors do what Grinnell calls "spin[ning] up a machine learning story, even though their product really doesn’t need it, or have it at this point."

This isn't good for security.

Instead, what the security industry needs now is to settle into an understanding of where AI will revolutionize cybersecurity and where it’s not yet worth our efforts.

Related:
Get the best of CSO ... delivered. Sign up for our FREE email newsletters!