Amazon Web Services last year was estimated by Gartner to be five times bigger than its next 14 competitors combined. That\u2019s a lot of virtual machines. And they all run on a customized version of the open source Xen hypervisor, so when the Xen code has a security vulnerability, that\u2019s a big deal for AWS.In the past six months AWS has twice had to reboot some of its Elastic Compute Cloud (EC2) servers because of a Xen vulnerability. In September, 2014 about 10% of EC2 instances were rebooted and just this week AWS announced that about 0.1% of instances had to be rebooted to install a security patch. That may not sound like a lot, but at the scale AWS operates, it\u2019s still a large number.+ MORE AT NETWORK WORLD: The myth about how Amazon\u2019s cloud started that just won\u2019t die +What happens inside AWS when there\u2019s a Xen vulnerability discovered? LinkedIn Steve SchmidtThe answer is that Steve Schmidt gets busy (not that he isn\u2019t already). Schmidt is AWS\u2019s vice president of security engineering and chief information security officer (CISO) \u2013 and he\u2019s a former FBI section chief. He\u2019s the man keeping AWS\u2019s cloud secure. In November, Network World sat down with Schmidt at the AWS re:Invent conference and asked him to walk us through what happened inside AWS\u2019s cloud operations during the big September reboot.Verify the vulnerability AWS is a big user of Xen code, so company officials are some of the first to hear about Xen vulnerabilities that are identified in the open source community. When that occurs the first job for Schmidt\u2019s team is to determine if it will impact AWS. The company is notified of all the Xen vulnerabilities on a regular basis before they\u2019re made public. This allows the company to determine if the vulnerability is applicable to AWS and if so develop and install a patch.\u201cXen is a huge software package and there are many aspects that AWS does not use,\u201d Schmidt said.Most of the Xen vulnerabilities do not apply to AWS because the company has developed its own custom version of Xen. AWS has stripped out all the features of Xen that it doesn\u2019t need, both in order to customize the performance of the open source code to the company\u2019s unique use case, and to limit its exposure to vulnerabilities.But AWS does something else, too: It doesn\u2019t just use one flavor of Xen, it uses many.\u201cWe intentionally build our fleet differently across (the service),\u201d he said. \u201cYou don\u2019t want everything to be homogeneous because if it is then if a problem effects the fleet, it effects everything.\u201d AWS has different custom versions of Xen deployed across different services and regions, and none of them are the vanilla open source code.An internal hack If the AWS cloud is impacted then the company tries to hack itself.\u201cWe generate a test scenario to determine if we can trigger the vulnerability,\u201d Schmidt said. Then, extensive testing is done to determine if the vulnerability has been used against AWS.Meanwhile, other teams of security engineers are already building a patch and testing it across all the variants of Xen that AWS runs to ensure it meets security and performance requirements.Sometimes the process of installing the patch requires a reboot, as it has twice in the past half-year. Just like on a common PC, some updates and patches require a reboot and others don\u2019t. The majority of patches AWS implements do not require a reboot; AWS has architected its system to minimize the reboots necessary to patch its services.\u201cWe try very hard not to reboot,\u201d Schmidt said. If Schmidt\u2019s team finds it \u201ctechnically infeasible\u201d to install the patch without a reboot, then it notifies customers which services will be restarted.The dreaded reboot \u201cIt was very straightforward,\u201d Schmidt said, referring to the September issue. \u201cWe couldn\u2019t find a way to patch the service without rebooting, so we had to do it.\u201dComplicating efforts in situations like this is the fact that AWS has to inform customers that some of their EC2 instances need to be rebooted, but they can\u2019t say why. AWS can\u2019t announce the vulnerability to the world and expose itself or other Xen users.Customers should be ready for a reboot at any time though and there are steps users should take to ensure their systems can withstand a reboot or VM failure. One is to design their systems to be stateless so that if there is a reboot or a VM failure that the application fails over to healthy VMs without skipping a beat.Back in September Network World spoke with a handful of AWS users and most survived the reboot without a major issue. Born-in-the-cloud apps tend to be resilient to failure; legacy apps that have been migrated tend to have more trouble.Schmidt said AWS is always looking to improve its services: both technically to ensure it doesn\u2019t have to reboot VMs, and it is working to keep customers better informed. Part of that process includes sponsoring academic research, including some leading studies into how Xen servers can be hot-patched without requiring a reboot.