Amazon's cloud failed: How can your cloud be better?
How should industry respond to cloud failure like Amazon's? Gregory Machler looks at the the root cause and examines how weakness can be addressed within the cloud product industry
By Gregory Machler
May 10, 2011 — CSO —
Amazon's cloud services failure will likely lead to reservations by corporations to deploy solutions in the public cloud. It is likely companies will focus on private cloud solutions until they believe it is safe to dip in the public cloud. The Amazon outage was caused by an improper configuration of network infrastructure components. Human error led to gigantic cloud failure and financial losses.
The failure points to a significant weakness in the cloud. I mentioned in an earlier disaster recovery article that critical infrastructure products have too many features and models. They need to be like a car with common engine configurations, similar to features for cloud products. There also needs to be a limited amount of car types or cloud product models.
The whole cloud system needs far fewer permutations so that the integration of those products can be properly tested for disaster recovery. Too many permutations are too expensive to test. Some pieces of software like an Energy Management System (controls power grids); have complex finite state machines, sophisticated power algorithms, and full system failover capabilities. But, as with many software products some software error paths are never tested.
Also see: Amazon service outage reinforces cloud doubts
Unlike EMS systems, cloud services needs to avoid untested permutations by making it simple to integrate via modular product sophistication. The complexity is hidden within the products but doesn't adversely impact integration. Like large aerospace, telecommunications, and defense projects there is a need for cloud systems architects responsible for the proper integration and testing of multiple vendors products. They can analyze the risk associated within the products and their integration. If they see weaknesses they can focus on other product vendors. They also can enforce a limit to the number of cloud permutations the service provider or company will deploy.
Involving architects in the design of these solutions will lead to a positive pressure on the cloud product providers. They will influence the choice of products that meet the product requirements and integrate simply. Lets call these products cloud-aware. The products could have a limited number of pre-defined templates that they support and integrate well with other products. The use of templates allows these products to integrate with little intervention.
The use of an architect is really common today. How does one go about finding a good one? I recommend getting one that is good at the 'big picture' and a generalist. When large projects like the Brooklyn Bridge were developed, a generalist architect often led the project. Often they weren't the smartest, many of the niche architects were smarter and potentially more detail oriented. But they were good communicators, focused on critical design issues, and resolved disagreements well. They implemented ideas from the best architects and moved the project forward.
More Salted Hash with Bill Brenner