Automating security at AWS: How Amazon Web Services operates with no SOC

Amazon Web Services CISO Stephen Schmidt explains the company's recipe for combining security automation with ways to get management and staff to take security seriously.

robot monitoringa  cog wheel system for maintenance [automation]
Getty Images

Amazon Web Services (AWS) has become one of the largest technology companies in the world. The cloud giant has over 55 data center locations and millions of customers.

Given the size of its customer base, it’s little surprise that outages make the headlines like few other companies. A single human error in 2017 caused an outage in one region affecting the likes of Netflix, Reddit, Adobe and Imgur. According to one web monitoring service, more than half of the top 100 online retail sites experienced slower load times during the outage.

Operating at hyperscale requires staying on top of and preventing human-made errors, and AWS is heavily focused on automating as many tasks as possible.  This includes many of its security operations to the point where the company has removed the need for a traditional security operations center (SOC).

Automation key to AWS security

Stephen Schmidt has been at Amazon more than a decade – having previously had stints at American Information Systems and the FBI. He has been AWS’s CISO since 2010 and is responsible for ensuring the security of the computer systems, networks and data centers for the entire company.

steve schmidt headshot Amazon Web Services

Stephen Schmidt, AWS CISO

AWS has security teams in all the major regions where the company has a presence –across the U.S., Europe, and Asia Pacific. The company also embeds security engineers directly in each service team when it gets to a certain size. Security managers are in place at each data center location to handle physical security. Schmidt has previously admitted even he needs to give prior notice to access any of the sites. AWS also has a threat intel team monitoring threat actors and their tooling and methods, and a red team that performs penetration testing.  

Despite this large number of human bodies, AWS is constantly looking to remove people from the picture as much as it can. “We realized a long time ago that at our scale if you're not practicing security operations using automation, you're going to miss important things. So, we made a large investment in automating the tasks that humans used to do,” says Schmidt.

He admits that while automation takes a lot of investment, it is worth the effort in the long run, as “humans make mistakes.” Schmidt says one of the security-based goals he has set across the business is to reduce human access to data by as much as 80 percent to further drive automation and decrease the chance of human error.

“If you're depending on a human sitting in a room for a significant portion of your security posture, they're eventually going to go get coffee, they're going to be arguing with their friend on the phone, they're going to be looking at Facebook, or something,” Schmidt says. “We had to build systems internally that focused a lot on using machine learning engines as a way to reduce operator workload and to take the mountains of sensor data that we have internally and turn into something that's useful.”

No SOC and one security engineer on shift

According to a new report from CA, 98 percent of organizations are implementing automation, but the focus on removing people from the equation wherever possible at AWS has led to the removal of a traditionally essential piece of the security puzzle: the security operations center.

“We don't have a SOC. We don't have a room that has, you know, the big TV monitors in it and people watching them and that sort of thing,” Schmidt says. “I have a single on-call security engineer who is responsible for watching the automation and making sure it's functional. That rotates every six hours around the world to make sure we've got coverage in people's day times.”

Schmidt says that lone engineer can page in people and bring more resources to bear if necessary, but he claims operations have gotten to the point where the automation is good enough that most actions don't require an engineer. “For example, we have a system that watches our internal accounts and our internal staff use of our resources, and when it detects a misconfiguration or misuse, it automatically cuts a ticket to the staff members,” he says. “If they remediate we'll actually check the remediation to ensure its adequate and then close to it automatically. We're in the high 90 percent where we don't have to have a security engineer involved.”

However, one area of security that is still largely manual is reviewing penetration tests. The company supplements its own internal red team efforts with external companies to test the abilities of both. “We grade each one of our vendors based on what we find; what they don't,” Schmidt says. “It gives us both a measure of the relative success level of our internal teams versus what external folks can bring to bear, but it also gives us a level of surety on the quality that the external companies bring.”

As well as constantly comparing the relative skills of the external and internal teams, every time a new vulnerability comes to light that might affects AWS, the company will go back and assess whether it should have been found earlier. “We do a very rigorous scoring process for vulnerabilities or problems with software that weren't discovered during the penetration testing or red-teaming process and then surfacing late: Who did the tests, and should they have found this or was this something that was very unusual?” says Schmidt. 

Poaching machine learning talent from academia

The company has a number of R&D centers around the world, and often each location has its own specialization. The Cambridge, UK, center, for example, is the heart of the company’s Alexa and drone development. There is no single hub of security development as the company prefers to chase the necessary talent, however. “We look for areas that are farther out on the horizon and we say, 'what is the kind of fundamental research that we think has a decent chance of yielding something that we can actually use in a couple of years?'” says Schmidt.

Once an area that fits the requirement is identified, Schmidt says the company finds academics who are working in that area and tries to bring them into the fold. “Not all academics are suited to or want to build services or systems for general consumption, but there are groups of people who really like building something concrete; they want to see their idea turn into a real service.”

One example is Byron Cook, a professor of computer science who was recruited from University College London (UCL) to head up the Automated Reasoning Group (ARG) team, which is dedicated to automating areas around testing, configuration and validation. “The ARG were in the security team, and we built the tooling for security purposes to devise automated tests that prove the thing is functioning the way it's expected to,” says Schmidt.

Many of the company’s recently released security products have been focused around automation, including Amazon Macie (designed to identify and protect sensitive data), Amazon Guard Duty (its threat detection service) and both Tiros and Zelkova, which were developed by the company’s ARG team. Techcrunch has reported that the company is working on two more automated validation tools, Quivela and SideTrail, both of which are cryptography-based security services coming out of the ARG unit.

Eating their own dogfood to create security products

Amazon is proud of how it does product development. The whole cloud division started as a way to fix internal issues around scale and making the underlying eCommerce platform easier for external partners to work with before becoming an industry giant. Many of its services begin life as internal tools designed to manage the platform itself.

Schmidt says it is common for customers to see an internal AWS security tool and ask for something similar. Once a product has been chosen, it takes around a year to go from a rough-around-the edges internal tool into something productized and customer-friendly. “If we are building software for our own use we always ask the question, ‘Should we be externalizing this?’”

The security tools AWS offers customers are the same ones used to manage platform. Schmidt’s security team is a massive consumer of event-driven Lambda operations, while Amazon SageMaker (a machine-learning-as-a-service model maker) is used for prototyping machine-learning-based security tools and developing models for things such as log analysis. The company also has other machine-learning-based security operations that might never be externalized.

“We do a lot of configuration examination internally to make sure that there are systems that are functionally correct,” says Schmidt. “We use a lot of machine learning to build models on how different classes of systems are configured versus should be configured. Whether that's something is generally useful [for customers] I don't know.”

Internal security taken seriously, but also as a game

While many security officers report into the CIO or the risk/legal functions, Schmidt reports directly to AWS CEO Andy Jassy. Schmidt explains this decision was made to ensure security was taken seriously and the role of CISO had the right level of access and visibility within the company.

“Andy has a meeting every single week with me, Charlie Bell [SVP, utility computing services], and the senior VPs in the company where we go over tactical security issues for the week,” says Schmidt. “How many CEOs are looking at Heartbleed-like stuff every day, or are looking at security improvements that are required every week?”

Despite the increasing focus on automation and removing the human element, Schmidt still has to contend with people. To make those outside the security function take security more seriously, Schmidt uses what is almost gamification to achieve his goals. Every month a report goes out showing how every VP is doing against predefined security expectations, essentially acting as a scoreboard.

“These are intensely competitive people, so what's the first thing we do? We stack rank them against their peers,” says Schmidt. “They all immediately flip to the chart to see who's on the top versus who's on the bottom. And those who are down farther turn to the keyboard [and] start banging away at their teams, which is exactly the response that I want.”

“It's one of those circumstances where we get to use the way our senior leaders think to the advantage of the security team.”

Copyright © 2018 IDG Communications, Inc.

How to choose a SIEM solution: 11 key features and considerations