The purpose of Business Continuity Event Management (BCEM) is reduction of harm to employees, customers, investors, and the business when an unexpected business interruption—a business continuity event (BCE)--occurs. In this post, I provide an overview of how to manage a BCE. In future posts, we’ll incrementally expand this high-level view into a BCEM plan.
When responding to a catastrophic BCE, the first consideration is protection of human life. Although we’ll address this in our plan, most of our focus will be on managing the smaller events which happen much more frequently.
The second consideration is the restoration of information processing services. Finally, we need to mitigate people, process, or technology weaknesses which enabled the event—the root causes. BCEM which effectively addresses these areas produces the following benefits for the organization:
- The business impact of each incident is minimized
- Human safety is addressed
- Corporate liability due to lack of due diligence is mitigated
- Regulatory requirements are met
- The organization’s public image is protected by a fast, professional response
Meeting these objectives requires a process-based approach. The steps in the process we’ll follow as we move toward a complete BCEM plan are depicted in Figure 1.
Figure 1: BCEM Process
- Prepare -- Optimal business mitigation is rarely possible unless the organization plans for probable events. Preparation includes development of manual processes, implementation of redundant systems, documentation of mitigation and recovery plans, and training key personnel.
- Detect – Early detection of a service interruption helps minimize harm. Detection tools and techniques are a critical element of a BCEM strategy.
- Contain and Mitigate – Upon detection, the response team should use act as planned during the Prepare step to either quickly recover the failed process or system, or to implement interim, mitigating processes. For example, the loss of an order entry system might result in telephone sales staff moving to paper-based order taking to continue accepting customer purchases.
- Analyze – Once service is restored or mitigated, an analysis of the event provides understanding of root cause as well as possible issues with containment and mitigation processes. Even if the service is still down, the recovery team should take the time to identify root causes before attempting remediation. Addressing symptoms instead of the disease results in inevitable recurrences of the event.
- Remediate and Measure – Using information collected in the analysis step, the recovery team uses an action plan to remove root causes and, if necessary, restore full system operation. They should then monitor and measure the effectiveness of their remediation steps.
Results of this process provide feedback to employees responsible for preparation activities. Lessons learned, especially during incident and response root cause analysis in the Analyze step, are integrated into detection, containment, and mitigation documentation. It’s also necessary to change recovery team training to account for differences.