Amazon Macie automates cloud data protection with machine learning

Amazon promises AWS S3 customers that they will be able to identify and protect sensitive data faster with Macie, but is it enough to catch up to what Microsoft and Google offers?

cloud security ts

Amazon offers a number of excellent tools to help enterprises keep their data and applications safe in the cloud. Last year, Amazon unveiled Amazon Inspector, its host-based application vulnerability assessment tool to monitor what is installed and configured on each virtual Instance. This year, it’s Amazon Macie, a security service designed to automatically discover and protect sensitive data stored in AWS.

As organizations move more of their data to Amazon’s various cloud offerings, security teams have the unenviable task of continuously tracking the data to identify, classify and protect sensitive pieces of information such as personally identifiable information (PII), personal health information (PHI), regulatory documents, API keys, secret key material and intellectual property.

Amazon Macie automates what has traditionally been a labor-intensive task by using machine learning to understand where sensitive information is stored and how it is accessed. Macie dynamically analyzes all attempts to access data and flags anomalies, such as large amounts of data being downloaded, uncommon login patterns, or data showing up in an unexpected location. Macie can alert when someone accidentally makes sensitive data externally accessible or stored credentials insecurely.

“Amazon Macie is a service powered by machine learning that can automatically discover and classify your data stored in Amazon S3. But Macie doesn’t stop there, once your data has been classified by Macie, it assigns each data item a business value, and then continuously monitors the data in order to detect any suspicious activity based upon access patterns,” Tara Walker, AWS tech evangelist, wrote on the Amazon Web Services blog.

Currently only available for S3 customers, Macie support for other AWS data stores will come later in the year.

Understanding Macie

Amazon Macie applies predictive analytics algorithms on authentication data such as location, times of access and historical patterns to develop a baseline for how each piece of data is used. To use Macie, administrators have to enable appropriate IAM (identity and access management) roles created for the service. Amazon has created sample templates for cloud formation to set up the necessary IAM roles and policies.

Instead of continuously scanning S3 buckets to find new data which needs to be classified, Macie uses event data from AWS CloudTrail to check for all PUT requests into S3 buckets. This way data is classified automatically as they are added into the buckets. Macie uses the file metadata, file contents and what it has learned about similar files in the past to properly classify the data. It doesn't rely on patterns to just recognize known data, such as PII, but can also look at things like source code. After classifying the data, Macie assign a risk level between 1 and 10, with 10 being the highest risk and 1 being the lowest data risk.

“Since we started using Amazon Macie, we’ve found that it is flexible enough to solve a range challenges that would have previously required us to write custom code or build internal tools, such as securing PII and alerting us to access anomalies, helping us move fast with confidence,” says Patrick Kelley, senior cloud security engineer at Netflix. The video streaming service is no stranger to building custom tools when necessary.

[Related: Google tries to beat AWS at cloud security ]

Macie can also be integrated with AWS CloudWatch Events and Lambda. For example, organizations have to comply with the European Union’s strict privacy regulation--The General Protection Data Regulation (GDPR)--by May 2018. As Amazon Macie recognizes personally identifiable information (PII), organizations can use the Macie dashboard to show compliance with GDPR regulations around encryption and pseudonymization of data. Macie can be combined with Lambda queries to remediate GDPR issues.

Catching up on security

Despite dominating the cloud services market, Amazon has lagged behind Microsoft and Google in offering security tools that are turned on by default. Amazon Web Services provides a comprehensive set of security tools, but they are effective only if the administrators actually take advantage of them to secure their instances. In contrast, Microsoft has integrated management tools in its Azure platform and Google offers many security offerings by default in Google Cloud Platform. Amazon’s latest moves help close some of the gap.

Turning on AWS CloudTrail, a governance, compliance and auditing service for AWS accounts, by default is a particularly welcome change. CloudTrail provides visibility in everything that happens under the account, and is extremely helpful for understanding what changes were made, by whom, and when. The problem was that too many administrators found out too late that CloudTrail was not turned on; it doesn't collect data if not enabled at the time the instance is created. With the change, all customers by default now get visibility into the last seven days of account activity without having to configure the service.

Amazon is adding rules to its AWS Config Service to evaluate AWS configurations to help secure S3 buckets. Considering the number of data exposures this year alone which arose because the S3 buckets were not configured correctly, these rules would help identify buckets that allow global read/write access before they become problems.

Amazon Elastic File system now offers encryption of data while at rest. Amazon also did a complete rewrite of CloudHSM (Hardware Security Module) so that provisioning, patching, high availability and backups are now built into the managed service. FIPS 142-2 Level 3 support is included, along with security mechanisms designed to detect and respond to physical attempts to access or modify the HSM.

Copyright © 2017 IDG Communications, Inc.

7 hot cybersecurity trends (and 2 going cold)