Excerpted from Building an Enterprise-Wide Business Continuity Program by Kelley Okolita (CRC Press, 2009).
What is a Business Impact Analysis?
The next step in the planning process is to perform a business impact analysis (BIA). The BIA becomes the foundation of the plan you will build for your recovery. This is the process that will determine what needs to be recovered and how quickly. It is one of the most difficult tasks to perform and one of the most critical to get right. The more time you have to bring a business function back in service following a disaster, the more your recovery options increase. The BIA is invaluable for identifying what is at stake following a disaster and for justifying spending on protection and recovery capability. Nobody but you will mind your own business.
Why Business Impact Is About Time Sensitivity, Not Criticality
I dislike the use of the terms "critical" or "essential" in defining the processes or people involved in this phase of the planning. I prefer to use the term "time-sensitive." Generally speaking, organizations do not hire staff to perform non-essential tasks. Every function has a purpose, but some are more time-sensitive than others when there is limited time or resources available to perform them. A bank that has suffered a building fire could easily stop its marketing campaign but would not be able to stop processing deposits and checks written by their customers. The bank's marketing campaign is essential to its growth in the long term, but in the middle of a disaster it will take a backseat, not because it is not critical but because it is not time-sensitive.
The organization needs to look at every function in this same light. How long can the company not perform this function without causing significant financial losses, significant customer unhappiness, or significant penalties or fines from regulators or from lawsuits?
How To Do This and Get It Right
It is all about impact. It is all about what keeps the business running and what can wait till later. When I was doing mid-range and client-server DR for a company, I had to speak to the business unit that managed the general ledger for the company. The general ledger is concerned with accounts payable and receivable. It is just like your checkbook. It is where a business keeps track of the monies coming in for payment of goods or services and those going out to pay for expenses such as payroll. In this company, the general ledger ran on an AS400, and my job was to figure out how long I had before I needed to bring back the system. When I met with the business unit, the first response was that it had to be back by day one after a disaster.
My response was that I was willing to build whatever recovery strategy the business needed and was willing to pay for, but before I priced this strategy, I wanted the team to think about something. This is a financial-services firm. If we did not run the general-ledger system for 30 days, it would be ugly. There is no question that we would have to cut manual checks to keep critical services going and have to maintain a manual general ledger until the system was brought back. I would not want to be the accountant who had to reconcile all the manual-ledger entries into the application once it was restored, but the firm would survive as a business if it did not run the general ledger for a month. How long do you think we would survive as a business if we did not answer our phones? Price our mutual funds? Process our customers' transactions?
It is not about being important. When business is normal, the general ledger is very important. It is about what keeps us in business. It is about surviving. Disasters are not about business as usual. Management metric reporting is very important when business is normal. My CEO expects his management reports on his desk at 7:00 a.m. every business day. But if the home office burnt to the ground, I know he would be willing to forgo seeing them for a few days!
All business functions and the technology that supports them need to be classified based on their recovery priority. Recovery time frames for business operations are driven by the consequences of not performing the functions. The consequences may be the result of business lost during the down period: contractual commitments not met and resulting in fines or lawsuits, lost goodwill with customers, etc. Impacts generally fall into one or more of these categories: financial, regulatory, or customer retention. Remember, these were the same categories we talked about in Chapter 2.
What steps can you give your planning team to conduct a business impact analysis? It starts with simply identifying the processes or functions performed in their area. Working with the management team, list everything that is done by that group. Once the business processes are understood, each one must be analyzed against three areas: financial risk of not performing that function, regulatory risk of not performing that function, and customer or reputational risk of not performing that function.
Financial risks may include loss of revenue, loss of interest on bank balances, the cost of borrowing to meet cash flow, loss of revenue from sales, interest value on deferred billings, penalties from not meeting contractual commitments or service levels, opportunity lost during the downtime, and losses from processing transactions at market risk as of the date received.
Regulatory risk may include penalties for not filing financial reports or tax returns on time, fines or penalties for noncompliance with regulatory requirements in place for your business, or the need to pull products off shelves because of lost product-testing information.
Customer or reputational risk includes loss of customer confidence and market share, liability claims, customer dissatisfaction with service, media coverage of customer complaints, loss of goodwill, and loss of competitive advantage.
It is all about impact. What happens to the company if we do not do this?
Once your planning team has a list of functions and what happens if they are not performed, the next question to be answered is, how soon do we start to see the impact? Is it as soon as we stop doing something? A customer call center that has been evacuated due to a fi re stops performing its function immediately. Unless there is another call center someplace else that is fully equipped and staffed to take calls, the impact to your customers is immediate. How significant this impact is depends entirely on your business—how many calls you get and what the calls are about.
Let's say your call center receives an average of 1200 calls per hour and on average, 72 percent of those calls result in a sale with an average value of $57. Do the math: 1200 x 0.72 x $57 = $49,248, the potential loss per hour that the call center is not operational.
If your customers or potential customers find your product or service and place their orders on your website and it goes down, you have an immediate impact. Again, how significant the impact is depends on your business—how many orders you take, how much each order is for, and whether the customers will wait and order from you later or take their business elsewhere.
After your planning team has a list of functions, an idea of what happens when they stop, and how quickly you start to see the impact, the next question to be answered is, how much impact? You can use quantitative measures such as actual dollars per minute, hour, or day of downtime or qualitative measures, which predict certain outcomes based on the knowledge or experience of the individual.
Once all that information is pulled together, you have a view of everything the company does, what impact it would have if the function could not be performed, how quickly that impact would be felt, and how significant the impact will be. This information is the start of what we need to develop the appropriate recovery strategies for each site we do business in.
A SIMPLE BIA FORM
Exhibit 6.1 shows a simple BIA form for classifying functions and determining their time-sensitivity code, which is shown in Exhibit 6.2. To use this form, the planner will need to adjust the factors to reflect the business being evaluated. The factors that may need to be adjusted are "time before impact," what would be considered high, medium, or low in the "customer impact" and "regulatory impact" columns, and the dollar values in the "financial impact" column.
Have each of your planning-team members complete the form for his or her functional areas and return it to you. You will then need to add it up by site to understand the functions in each site that you will be planning for and the recovery time frame associated with that function.
A common question I am asked during this phase of the planning is, what if this particular function is only time-sensitive at specific times of the month or year, like month-end or year-end? My response is that you should rate that function at its highest level. If you have a disaster and it is not the time when the process needs to be resumed quickly, those resources can be used for other things, but it is better to plan for the worst-case scenario for each function because disasters are very good at occurring at the worst possible time. Ask Murphy. In fact, like many others, I think Murphy was an optimist.
<[See next page for sample BIA form and explanation of time sensitivity codes.]
Exhibit 6.2 Business Function Recovery Time Sensitivity Codes
Rating total of 45 or more =
AAA Immediate Recovery
Must be performed in at least two geographically dispersed locations that are fully equipped and staffed.
Rating total of 25 to 44 =
AA Up to 4 hours to recover
Must have a viable alternate site that can be staffed and functioning within the four hour timeframe required.
Rating total of 15 to 24 =
A Same Day Recovery
Must be operational the same business day and must therefore have a viable alternate site that can be staffed and functioning within the same business day.
Rating total of 10 to 14 =
B Up to 3 days
Can be suspended for up to 3 business days, but must have a viable alternate site that can be staffed and functioning by the fourth business day.
Rating total of 7 to 10 =
C Week 1
Can be suspended for up to a week, but must have a viable alternate site that can be staffed and functioning the second week following an interruption.
Rating total of 0 to 6 =
D Week 2 or greater downtime allowable
Can be suspended for greater than one week. A maximum number of days should be identified for this function.