Most CISOs likely believe in the potential for artificial intelligence (AI) and machine learning (ML) to transform the information security landscape over the next 3 to 5 years. That doesn't mean they aren't sick and tired of hearing about it. Many would probably give serious consideration to giving up a paycheck in trade for never having to hear the terms AI or ML again. Virtually every security software vendor on the planet is invoking artificial intelligence as if it had magical properties. To make it worse, many vendors don't have the goods.
Do some security software vendors overpromise and under deliver on the benefits of their AI/ML implementations? "Massively," said Dr. Anton Chuvakin, VP and distinguished analyst, Gartner. "Examples range from the more blatant and idiotic 'we have a military-grade AI' to subtle, saying 'we use AI' when they are referring to the use of foundational statistical methods that are 300 years old."
The cybersecurity tools market has played up the term AI to the point that "CISOs and CIOs roll their eyes when they hear about yet another AI-based product," said Tom Bain, VP of security strategy, Morphisec. "I know one vendor that mentions AI 22 times on its homepage."
Dr. JT Kostman, leader, applied artificial and frontier technologies, Grant Thornton, said "most of the companies claiming to have AI/ML intelligence capabilities that I've evaluated have ended up having to admit that their claims were little more than marketing puffery."
That willingness on the part of some vendors to exaggerate or fabricate an AI story is only part of the problem. Almost 60 percent of the IT respondents to a new study conducted for Webroot admit that, while they are aware that some of their software makes use of AI or ML, they're not sure what that means. Moreover, only 36 percent know with certainty how their cybersecurity vendors source and update their threat data. The survey was fielded in late November and early December 2018. It reached 400 director level and above IT professionals, 200 in the U.S. and 200 in Japan.
Many experts and AI-experienced CSOs strongly urge infosec leaders to get in the game so that when AI is an absolute necessity you're not playing catch up with complicated technology. For example, many people grossly underestimate the amount of data needed to train a machine learning model properly. It can take you a while to build up that data. "The mistake that many people make is that AI is about the sophistication of the algorithm. It's not. The key is that AI/ML requires massive amounts of data for training," said Thomas Koulopoulos, chairman and founder, Delphi Group.
Niall Browne, SVP trust and security and CISO at Domo, suggests that you do your homework now without necessarily buying. "Smart CSOs are in the learning phase with AI/ML," Browne said. "They are soaking up as much as they can about AI technology now. They are talking to vendors to understand the capabilities and limitations of the products. They will then be ready to make an informed, risk-based decision when AI is showing [more] promise."
To help you with this process, we've polled the experts and put together a list of 10 questions you can draw from when talking to security vendors. They should help you separate the wheat from the chaff (i.e., the potentially useful from the little more than hype) among security software offerings. Credit where it's due: John Omernik, distinguished technologist, MapR, who was interviewed for this story, was the originator of a list of five AI/ML topics to discuss with vendors. This list is based on the ideas and insights of all the experts we talked to, but Omernik was the foremost contributor.
10 questions to ask about AI/ML-based security software
1. How do I know that the training data is representative?
You want to know what data was used to train the vendor's models so that you can determine if that data is representative of your data and the behavior you will see on your network, suggests Aaron Sant-Miller, senior lead data scientist, Booz Allen Hamilton.
2. How fresh, clean and learn-able is the training data?
You'll want to know how frequently the training data set is refreshed, says Koulopoulos. "How does it learn and evolve its detection abilities over time? How much data was required to train the AI/ML engine adequately? What you're trying to get to is the degree to which the AI system learns and how much data is needed for it to learn and relearn," he added.
3. Can you get performance metrics?
This was the most commonly suggested question by the people interviewed for this article. "The vendor should be able to share the results of double-blind controlled experiments that detected world-class hackers or pen testers trying their best to breach a system," said Kostman.
"Determining whether a vendor is using machine learning instead of just an algorithm can be accomplished with the metrics used to measure the performance of the vendor's ML model," explained Marzena Fuller, CSO, SignalFX. Such metrics should also characterize the accuracy of the model.
For supervised models, Fuller recommends asking about the "Confusion Matrix." She adds that a value close to 1 represents high accuracy.
"Evaluating the performance of unsupervised models is more challenging," Fuller said. "A relatively small value for intra-cluster distances and a relatively large value for inter-cluster distances indicates that the model is effective at grouping like items with discrete characteristics."
4. Can you get a real-world demonstration?
If the vendor doesn't have hard metrics, think about walking away. But if you want to give them another chance, Chuvakin suggests you make this request: "Show me an example where your AI solution can make a better decision than my SOC analyst." He and others also recommended asking for customer references..
5. Does a proprietary model mean you can't customize it?
When the vendor claims a proprietary AI/ML implementation will "solve all the problems," Omernik suggested that CISOs and CSOs ask "can the customer customize it?" If so, what level of training would your engineers need to make those customizations? Can different models work on the same data or can your data only be worked on by models that are bundled with the security product?
6. How flexible is the vendor's AI/ML implementation?
Omernik suggests asking these questions to determine how flexible the implementation is: Can a vendor's AI/ML implementation work with different types of data, such as log, audio, video, transactional and so on? If so, can the data sets work together, or must they be separate?
7. What about updates to the AI/ML solution?
You want to know whether you'll have to pay incrementally or buy a new version of the security application to get updates. Also ask how the vendor distributes such improvements to customers and how difficult it is to integrate them.
8. Will the vendor's solution be a "black box" to your security team?
Being a black box isn't a straightforward pro or con. But you will want to know if it supports applying the latest AI/ML toolkits and how your team will work with it. "Will the tool help practitioners learn about how data works and help them expand their understanding of data engineering and data science? Or is it a black box solution that forces the customer to rely on the vendor to make changes?" said Omernik.
"For many clients, a black box is much better than an open-ended toolkit," said Chuvakin. The latter "screams 'years of consulting before any hope of value.' You say black box; I say day-one value."
9. How was AI integrated into your offering?
"Was it acquired, built in-house, or a part of software you've been using from the outset?" Koulopoulos said. "The general caution here is to be leery of bolt-on AI. Just using [Google's] TensorFlow doesn't qualify."
10. How does their system detect new and novel types of attacks?
"How does it contend with what is known as the cold-start problem?" Kostman said. Machine learning algorithms need data the same way a fish needs water, he added. "So, how can the vendor's AI-based system identify threats that are unlike anything it's ever encountered before?"
11. Who owns the data?
Be careful with your data. "The primary goal for AI vendors currently is not to sell, but instead, to gain access to as much data as possible to test and improve their [models and] algorithms," said Browne. "It's important to understand what level of access there will be to your data and systems — and who owns the resulting AI metadata."
Koulopoulos agrees, adding a similar point: "One of the biggest hotbeds of controversy concerns the ownership of the training data, which accumulates over time."
Advice for CISOs
Measuring the effectiveness of your AI-based solutions is one of the most important things you can do. But to do it right, you need expertise in that area. "Every company should have a data scientist on staff," said Fuller. "A CSO who plans to incorporate ML solutions extensively should consider hiring both a data scientist and a data engineer."
While we're talking about staff, if your tech people have the knowledge and training to help you evaluate AI-based security products, trust them, advises Omernik. "At some companies, executives need to find a way to trust their technical people, who will be talking to any vendors who might be trying to sell a product." You can't trust everyone, he adds. But you need to find or hire one person who's got the experience to cut through the vendor hype and know what's what. The AI/ML talent gap is likely to create a fierce shortage of employees who are in very high demand. Get out ahead of this. Omernik suggests asking yourself "What am I doing to attract talent? How am I supporting my technical people?"
You may not be aware of it, but you are in danger of falling behind, precariously so. "Over the next 3 to 5 years, if you're not in the 90th percentile or higher in knowledge and experience on how AI and ML can be used to defend and fight cybercrime, you'll be risking your organization and career," Koulopoulos said.