AWS raises machine learning expectations for cloud security

AWS's new GuardDuty and Macie offerings unleash the power of machine learning to secure your data. Are they right for your enterprise?

artificial intelligence / machine learning
Thinkstock

Turning on machine-learning based cloud security tools like Amazon Web Service's (AWS) new GuardDuty and Macie offerings might be a no-brainer for AWS customers. It raises the bar for attackers, but will not protect you from sophisticated adversaries, experts say.

The AWS Macie service, announced in August, trains on the content of users' Amazon S3 buckets and alerts customers when it detects suspicious activity, with a focus on PCI, HIPAA, and GDPR compliance. AWS GuardDuty, a complementary offering announced at the end of November, uses machine learning to analyze AWS CloudTrail, VPC Flow Logs, and AWS DNS logs. Like Macie, GuardDuty focuses on anomaly detection to alert customers to suspicious activity.

"From a technical point of view this is amazing," Clarence Chio, author of the forthcoming O'Reilly book Machine Learning and Security, says. "Anytime a horizontal platform provides a service like this, [it is] providing something no one else has the ability to provide."

A machine learning model consists of an algorithm and training data, and the model is only as good as the data it's trained on. That's why cloud security that employs machine learning excels. A cloud provider like Amazon has visibility across its entire network, making it much easier to train its machine learning model on what is normal and what might be malicious. "Algorithms are never secret or proprietary for long, but data sources are the most valuable in any of these offerings," Chio adds.

While threat intelligence sharing among organizations is growing more common, the quality of data any single enterprise is likely to possess falls far short of the data available to a cloud provider like Amazon. This concentration of useful threat intelligence will probably accelerate the enterprise migration from data center into the cloud.

There are some gotchas, however.

Machine learning raises the bar for attackers

A machine learning model is only as good as its training data, but that also means it is less effective at detecting things it has never seen before — so-called "black swan" events. "There are so many things wrong about how machine learning is portrayed," Hyrum Anderson, technical director of data science at Endgame, says. "When you boil down all the hype, machine learning gives you automation — you give it data, and it tells you what to look for, instead of providing it to a human and poring through all that data."

AWS's CISO, Stephen Schmidt, tacitly admitted this in a press release. “By using machine learning to understand the content and user behavior of each organization,” he said, “Amazon Macie can cut through huge volumes of data with better visibility and more accurate alerts, allowing customers to focus on securing their sensitive information instead of wasting time trying to find it.”

He's right. Services like Macie and GuardDuty are an excellent way to catch low-hanging fruit like improperly configured S3 buckets, that threaten enterprise data stored in the cloud. Many of the data breaches seen in 2017, such as the exposure of classified US Army/NSA INSCOM files, millions of data analytics records on American voters, and the Verizon breach, may well have been prevented by Amazon's new machine-learning-based cloud security.

Experts warn, though, that machine-learning classification against an adaptive adversary is an unsolved problem, and machine-learning-based cloud security measures are not likely to be effective against sophisticated adversaries.

Machine learning's ability to classify malware probabilistically, for example, is a significant improvement over traditional antivirus malware signatures, which either match or do not match in a binary fashion. Machine-learning-based malware detection, however, can classify with a degree of uncertainty (e.g., "this executable has an 80 percent chance of being malicious") and then refer that file to a human for further inspection.

Experts warn, though, that using machine learning for detecting malicious activity remains in its infancy, and while cloud machine-learning-based security raises the bar for an attacker, it is ineffective against sophisticated adversaries capable of varying their attack playbook. Anomaly detection is harder than it sounds, Anderson notes, pointing out that there is always a tradeoff between true positive and false positive rates. "It's easy to find 'unusual'," he says. "The problem is that almost everything is unusual in some way. To sort out the malicious from the unusual is the real challenge there.”

What does an adaptive adversary look like?

In cutting-edge research published at the beginning of December, researchers at MIT demonstrated their ability to fool Google's InceptionV3 machine-learning image classifier. The researchers 3-D printed a turtle that, from every possible angle, fooled the machine-learning model into classifying the turtle as a rifle.

If academic researchers are able to fool Google's state-of-the art machine-learning model, you can safely assume that nation-state intelligence agencies have had this capability for some time and that they possess the technical capability to defeat machine-learning models designed to detect malicious network activity. Maybe you don't have nation-state adversaries in your threat model — or think you don't. But as noted security expert Bruce Schneier likes to say, today's academic attacks are yesterday's nation-state attacks and tomorrow's criminal attacks. Attacks only get easier over time, never harder. We may, therefore, anticipate that garden-variety criminals will also be able to fool machine-learning-based security tools in the medium term.

That doesn't mean Amazon's Macie and GuardDuty have no value — quite the opposite. Defensive security is about raising the cost for an attacker, and these machine-learning-based security tools deliver that.

Banish the hype

The intersection of machine learning and security is frothy, to put it mildly. Neither uncritical enthusiasm (“AI is our savior!”) nor nihilistic despair (“machine learning is garbage”) is a productive attitude. “Don't throw the baby out with the bathwater,” Anderson says. “Educate users to ask questions, and educate marketing people to answer them.”

The speed of attacks will only grow faster, and the volume of threat intel will only grow larger. Evaluating threats and responding to them in real time will require automation. Like it or not, machine learning is here to stay.

NEW! Download the Winter 2018 issue of Security Smart