What is an RSO? A \u201creliability seeking organization,\u201d as described in Vanderbilt Professor Rangaraj Ramanujam\u2019s book Organizing for Reliability. We tend to think of cybersecurity as black and white; breach or no breach. We often focus on architecture, threats and defenses. In fact, we should also be concerned with the reliability of the security program. Here we define reliability as including performance consistency and resiliency. \u201cFault tolerant\u201d is another descriptive term.Many types of organizations have already developed highly reliable business processes. Achieving such goals includes both strategy and execution. I contend that much can be learned from these organizations...and venturing outside the security bubble.Security = reliabilityThis blog post was first inspired by a lecture I heard in Nashville by Professor Ramanujam. The post is all about learning from outside the security silo. The lecture itself was hosted by the Nashville Association of Contingency Planners (ACP) Chapter; the topic was High Reliability Organizations. Today\u2019s security agenda now overlaps with groups such as ACP; think of ransomware attacks and DDOS attacks to name two. I subsequently followed up by reading Ramanujam\u2019s book. This post is a condensed version of the lecture and the book, as I think it applies to information security.What we want from a good security program is business reliability. We want to stop unplanned work whether from breaches of confidentiality, data integrity or availability. With digital data and technology integrated into every business process, the reliability of those business processes is totally dependent on the reliability of the information security program. Many professionals have focused on security architectures, including technology, people and processes.Some estimate that over 2,500 security startups exist, with more jumping into the field every week. But few such startups are focused on achieving high reliability security. One exception category is that of \u201cBreach and attack simulation\u201d tools that facilitate continuous testing of controls (Verodin, Cymulate, Safebreach and others). This post argues that we need to focus more on building reliable security programs.The starting point is the concept of an HRO or High Reliability Organization. Such organizations are distinguished from RSO\u2019s, Reliability Seeking Organizations and average organizations, not focused specifically on high reliability. I believe most organizations today are not yet RSO\u2019s or HRO\u2019s regarding cybersecurity. More on how to become a cybersecurity RSO or HRO later.What is reliability?What is reliability in this context? That the system will not fail to do what is expected. There are two sides to the definition. First is the logic of anticipation. Have the appropriate controls and metrics been built in? Second is the logic of resilience. Is the system capable of containing the results of a breach? And equally important, does the security program emerge from a breach stronger than before? Unfortunately, breaches (and other failures) often lead to finger pointing, executive dismissals and everything else but real improvements.What things are we looking for in a reliable security program? First: \u00a0performance consistency with low variance. Today\u2019s focus is on periodic consistency based on annual or quarterly audits. Second:\u00a0 \u00a0intermediate events and near misses must be tracked. Too often, they are put on the bottom of the work queue for future investigation. Third: \u00a0resilience both after the breach or event and before. Today\u2019s definitions of security resilience emphasize responding to attacks only after they have hit the headlines.A key learning from HRO research is the importance of a systems approach to reliability. While popular accounts of breaches tend to blame \u201cthe operator,\u201d \u201cthe admin,\u201d \u201cthe outsourcer\u201d or \u201cthe CISO,\u201d real incidents have many causes. This point is made very effectively in Josephine Wolff\u2019s book, You\u2019ll See This Message When It Is Too Late. Wolff presents a blow by blow analysis of recent security breaches, illustrating clearly that each has many causes.Graeme Payne\u2019s recent book The New Era of Cybersecurity Breaches explains exactly what happened in the\u00a02017 Equifax breach. Conventional wisdom is that the company failed to patch an Apache Struts instance. Payne\u2019s book documents how a failure to promptly forward one email message led to this incident. A reliable, fault tolerant, security system would not be dependent on one human quickly forwarding any message.How to build an HROThree processes are found to be successful in building HROs. None of these will be new to security practitioners. However, I hope that the evidence that these processes work (and how they work) in other contexts will move those processes higher on the list of priorities in the security community. The three processes are continuous learning and improvement (Chapter 7 of Organizing for Reliability); compliance processes vs. risk-based processes; and managing for high reliability.Continuous learning is a key component to building a high reliability process. HROs make use of the Disaster Incubation Model (DIM) which describes the six-steps leading to disasters or reliability failures. Think of this as the risk management equivalent to the Kill-Chain. The DIM model includes six steps:Starting pointIncubation periodPrecipitating eventOnsetRescue and salvageFull cultural adjustmentContinuous learning ideally takes place in the \u201cincubation period\u201d before any disaster event occurs. Key ideas here are vicarious learning (ISACs and ISAOs; peer intelligence services like smarthive.io) and learning from small failures. I would be curious to know if Equifax\u2019s patch management processes had experienced other gaps prior to the Apache Struts related patch disaster. The key concept here is:\u201c...because near misses are generated by the same conditions that lead to large failures, if organizational decision makers could identify and correct hazardous conditions through experiencing and learning from near misses, they may be able to reduce the likelihood that their organizations would experience major failures in the future.\u201d \u2013 Rangaraj Ramanujam, Organizing for ReliabilityEnough said on this point.Compliance vs. riskCompliance vs. risk is a topic often discussed by cybersecurity leaders. The prevailing opinion is that while compliance requirements can help obtain budgets, risk analysis is necessary to build a secure organization. This attitude helps security professionals keep their jobs AND obtain funding!\u00a0 Given the vast amounts of work needed to become \u201ccompliant,\u201d it is easy to be fooled into thinking your organization is effectively managing risk.Many of the industries profiled in Ramanujam\u2019s book are highly regulated, like healthcare, nuclear, airlines and others. \u00a0Many collective years and analysis of the effectiveness of regulations are present in these industries. What are the findings that we can apply to cybersecurity? One is that a reliable system cannot be obtained by regulation. Regulators just do not have enough information. Government regulations also are too far behind the state of industry and end up being watered down in their creation. The CapitalOne breach is a case in point. The banking industry is one of the most highly regulated, yet apparently cloud based third party risks escaped the regulators\u2019 purview.Given that cybersecurity is regulated, what practices can we adopt from the experiences of HRO\u2019s regarding compliance and regulation? The distinction between goal focused regulation and error-focused regulation is an important concept. Most compliance regimes focus on the former; i.e. meeting control objectives. However, organizations may benefit from enhancing their internal error-detection capabilities. Another applicable point relates to extended organizations. In many cases managing the regulatory and reliability implications of the organization\u2019s supply chain may be the biggest risk faced by the organization.Managing for security has recently become a science. CISO\u2019s now present to the board and know not to be the department of \u201cno\u201d and to support business initiatives. But HRO\u2019s and RSO\u2019s have been managing to high reliability for decades. The \u201cthree lenses\u201d view of organizations leads to three parallel paths toward building a cybersecurity HRO. If you fail to see through all three lenses, you will likely not achieve your goals. The three lenses are:Strategic designPoliticalCulturalMost CISOs with technical backgrounds will readily use the \u201cstrategic design\u201d lens. This covers the organization of the CISO\u2019s team and the interface with business operations and IS operations. The second lens is the \u201cpolitical\u201d lens; some CISOs may be less able or interested in seeing organizational security through this lens. The objective here is to seek alliances to meet security goals. The third lens is the \u201ccultural.\u201dOne of the challenges faced by many security leaders is how to transition your organization toward a better security culture. Many CISOs might say: \u201cthings would be great, if only we had a better security culture\u201d. Some will say we need a big security breach, then we will see changes to a better culture. The experience of RSOs shows this not to be true. Disasters may or may not help. Finger-pointing may be the only outcome. The 30-day security sprint ordered after the OPM breach is an example of a non-productive response to a cybersecurity disaster.From theory to practiceWhat can we learn from RSOs and HROs about improving culture and how can we apply it to cybersecurity culture? One idea is to maintain a library of breaches from your industry and use this information to mitigate against small errors that will show up in the \u201cincubation period.\u201d If you don\u2019t know the causes of specific breaches how will you set up an effective defense? ("Those who cannot remember the\u00a0past are condemned\u00a0to repeat it," said George Santayana. I think he was referring to the practice of information security management.) This point is succinctly made by Roger Grimes in his book A Data Driven\u00a0Computer Defense.Better employee training can also go a long way toward establishing a security culture. My \u201cthumbs down\u201d opinion on \u201cawareness training\u201d was expressed in "Time to kill security awareness training." Today we need to educate all employees toward a culture of risk management. More and more security attacks are simply riding on the normal business process itself (phishing, BEC, credential stuffing), as contrasted to specialized attacks on technology. Micro-credentials for cybersecurity represent a new approach to helping all employees master the risk skills they need. The micro-credential is more focused than a full degree or a security certification like a CISSP. Why is it valuable for information security? Simply because it can efficiently teach the user exactly what she needs to know about security and no more.All security practitioners face the challenge of building a reliable program. Unfortunately, there is a lack of research on successful examples of making these transitions to a cybersecurity RSO or HRO. Building a truly reliable and self-sustaining security program seems to be like starting a fire in the woods. All the security manager can do is provide kindling, wood, logs and air. A self-sustaining blaze starts when the exact right configuration is found. To get that configuration, follow the management approaches outlined here...and be persistent.