In addition, the primary server’s normal workflow must be redirected to the secondary server, which becomes, at least temporarily, the new primary server. This redirection can require significant amounts of manual configuration, with two IT teams (one at each location) working overtime to enable and troubleshoot the switch. Similar reconfiguration applies to DNS, networking, replication topology, and other infrastructure elements. Testing requirements are massive, and additional IT staff must step into place at the secondary facility while the original IT team remains pinned down trying to get the primary facility back online.
“Of course, as we’re watching the big trends around 'software is eating the world' and 'every company is becoming a software company,' there are fewer and fewer organizations for whom downtime is acceptable. DR often means at least several minutes, if not more, of downtime, and of course, because you’re bringing an idle system online all of a sudden, it may not start operations smoothly. But yes – active/active architectures are best suited for organizations that cannot tolerate downtime,” Barney said.
Joseph George, vice president of product management at Sungard AS, said he wouldn’t frame the debate between the two architectures purely in terms of efficiency, because often the biggest deciding factor for what resiliency tier a business selects is based on what companies can afford. “Clearly, if cost was not a factor, every business would have [high-availability] systems. But they typically can only afford (and need) that level for the most mission critical systems and applications,” he said.
“It is important for enterprises to 'tier' their applications to help manage the economic balance between risk and the investment to mitigate. Tiering applications, as well as mapping their interdependencies, enables optimal recovery order sequencing and allows for the most cost effective availability program for the level of application downtime and data loss the organization can afford based on business impact,” he added.
Warm DR is fine
Swike said the majority of enterprises don’t really need active/active DR. Warm DR meets their needs. With appropriate bandwidth between sites, an RPO of seconds and technical RTO of minutes-to-hours is very achievable. “The technology is only part of the story though: there has to be discipline and time given to the process of DR. Having servers replicated is a great step, but if you don’t test it regularly how would you know it’s even going to work?”
For many, DR is number 11 on their top 10 list of priorities, she said. “That in no way means that they don’t care about DR. It’s just that day-to-day issues and production projects always tend to be at the top of the list.”
Mike Weber, vice president of Coalfire’s labs division said, fundamentally the key to a solid backup strategy is dependent on the business needs and mission criticality of the system. There are many tiered models that speak to critical data with a very short RTO measured in minutes that require streaming backups and/or replication to a redundant (but not high availability) system, through the non-critical data that can absorb the impact of recovery measured in days.
“Each of these, and various levels in between, requires different strategies to meet both business continuity and disaster recovery objectives. There are dozens of ways to proverbially skin that cat,” Weber said.
He said many times Coalfire finds that backup or disaster recovery sites do not have the same security protections and controls that production sites do. Penetration tests have found that when there are systems that are used in various backup or redundancy capacities, budget constraints often result in a lack of the same network security controls that protect the production environment.