Asynchronous vs synchronous. Dark disaster recovery vs. active architecture. Active\/active vs. active\/passive. No setup is objectively better or worse than another. The best one for you primarily depends on your level of tolerance for what happens when the server goes down.Security experts say how individual companies choose to save their data in anticipation of an outage depends on how long they can survive before the \u201clights\u201d are turned back on. What level of availability does your company need? Is the face of your company an ecommerce site where even a few minutes offline can cost an astronomical sum? Will the cost of an active-active system outweigh the potential loss of business from an outage?\u201cIt isn\u2019t about one being more efficient than the other. More to the point of what needs are you trying to solve for. For example, buying a Ferrari to get groceries will get the job done, but is it really fit for purpose?\u201d says Don Foster, senior director of solutions marketing and technical alliances at Commvault.In an active\/active architecture, typically a cluster of offsite servers are synchronized with the onsite server. This allows for there to be no downtime in the event of a disaster where one server is knocked offline. It can be configured to automatically failover. In this setup, less hardware is needed because all the systems across both sites are being used vs. only half the hardware in a dark disaster-recovery scenario. If you had 48 cores of dark disaster recovery, you\u2019d have 96 total cores and use only 48. In active\/active mode, you scale back to 32 x 2, for 64 cores, and all 64 are active.In a dark disaster recovery scenario, capacity is an entirely redundant system \u2013 all the hardware and software ready to go \u2013 but sitting completely idle. That capacity is not used at all until the first site fails, but it is replicated to at certain periods.Erin Swike, senior cloud solution architect at Bluelock, explains that \u201cactive\/active disaster recovery is the unicorn of the DR world. The idea of being able to sleep at night knowing that, should your production site fail, your DR site will automatically start serving up applications to users without a single packet lost or moment of downtown, is the nirvana of any CIO or system engineer.\u201cFor most, it remains the thing of fairytales and legends. Forget about the obvious factor of data center proximity and network latency; one of the most important factors is whether your applications are written to support this type of scenario. Unless an application was written with this in mind from the beginning, odds are that it can\u2019t support it,\u201d she said.The software costs are higher in active\/active mode because any system that\u2019s running in active mode must have licensed software. When the system is in dark disaster-recovery mode, the second system does not require paid licenses for database cores, for example, because only one set is live at a time. The fact that the two systems are staying in synch does not affect costs at all.In synchronous replication, there needs to be reliable network connections between the two servers. There is also extra labor involved in having to manage another site constantly....buying a Ferrari to get groceries will get the job done, but is it really fit for purpose?Don Foster, CommvaultThe negative with asynchronous replication involves losing some data between the downtime and when the server was last updated. It can be set up, though, for automatic failover.Anand Hariharan, vice president of product at Webscale Networks, says this is essentially the concept of cold\/warm\/hot backup servers. The pros and cons fall into two groups: service-level agreement and cost. Recovery point objective (RPO) and recovery time objective (RTO) define the SLA a vendor will provide to inform the user of the acceptable length of time data could be lost in the event of an outage and how fast they will restore services.\u201cNaturally, with a hot backup, or an active\/active architecture, there is zero downtime and a perfect replica of data, so from an SLA perspective, this is a very favorable path to take as it ensures that critical data isn\u2019t lost and critical applications continue to function without interruption,\u201d Hariharan says. \u201cThe downside here is of course cost. Maintaining two systems that are always running is essentially twice the cost, whether these costs be related to running replica architectures in a private data center, paying a managed hosting provider to perform the same task in an offsite location, or the cost of running double the instances in the cloud. In some of these scenarios, and depending on the size of the deployment, there is likely also a headcount consideration, where the additional technical staff required to managed twice the systems will also cause a steep increase in costs.\u201dCostsGiven an average (and increasing) rate of $7,900 per minute (Ponemon Institute), downtime creates a potentially huge cost for enterprises, both in immediate business and long-term reputation.Other costs include servers at a collocation site. They have the superficial attraction of saving money by distributing infrastructure costs over many users, but a closer look reveals that those savings aren\u2019t actualized, according to a ScaleArc White Paper. The collocation company still charges for any unused resources, including dark ones that might someday be activated into full use. Yet enterprises can't reduce the amount of resources dedicated to the secondary site because all information from the primary server must be backed up to the co-lo secondary.The ScaleArc report also notes that like collocation, public cloud solutions seem attractive owing to their assumed economies of scale. Nevertheless, organizations with security concerns (banks and government agencies, for example) still shy away from the cloud because of privacy concerns. Also, cloud systems can introduce latency that impacts application performance beyond acceptable levels. And again, cloud economics aren\u2019t always what they seem. Under full operation, cloud expenses typically run higher than when businesses own and run their own infrastructure.ScaleArc believes that maintenance costs for an active\/active architecture are lower because the tasks can be done during work hours rather than requiring a crew in the middle of the night. They also require fewer staff members because organizations can keep the application running during maintenance, so developers and other application specialists don\u2019t need to be involved.\u201cFor only a 20 percent increase in costs, organizations will enjoy 33 percent more system capacity, along with the additional economic benefits of reduced downtime, lower operational costs, better asset utilization, and likely higher total revenue,\u201d ScaleArc writes.Customers may not understand computing architectures, but they do want their apps and data to be available, all the time. Any vendor that fails to provide 100 percent uptime risks losing customers and revenue.\u00a0Al Sargent, senior director at OneLogin, said from a financial perspective, topline revenue dwarfs what companies spend on IT budgets. One study shows that companies spend between 3 and 7 percent of revenue on IT. \u201cShifting to active\/active architectures might increase an IT budget a fraction of a percent, but will prevent outages that could erode revenues by many percentage points,\u201d he said.Some of these cost downsides are lessened with a cloud-based SaaS solution, where a common management environment can be automatically maintained across both sites. The cloud enables fast scale out times, so you can deploy a reduced (smaller footprint) failover infrastructure that can restore applications almost instantly during a disaster incident, enabling better SLA, Hariharan said.Foster said both scenarios are valid to an enterprise disaster-recovery strategy.\u00a0Many applications and even infrastructure (storage arrays in the enterprise space have created active\/active grids through single namespaces that can cross data centers as well) have developed this technology to make it easier for companies to provide business continuity plans and recovery in the case of an infrastructure outage.\u00a0\u201cThe problem is the cost of maintaining and running these infrastructures. If an application or service has requirements to truly be a 'dial tone-like' system (always on \u2013 never without) then a business will spend the dollars required to ensure the five nines of availability and then some,\u201d he said.\u00a0Most critical applications with these needs have these types of failover mechanisms built in so that secondary or tertiary systems can resume if one has a failure, Foster added.\u00a0Clustering has also been around for a long time for servers and as that technology has moved down the stack into the infrastructure services, the ease at which availability can be provided is greatly improved \u2013 just at a cost.\u00a0Although he said cost is not the only down side. \u201cActive-active recovery solutions do not account for user error.\u00a0They are garbage in garbage out, and in the event of this type of an outage, you need to have something that is tracking point in time consistency of the data to recover back to.\u00a0The GitLab outage from a few weeks ago is a great example of this,\u201d Foster said.\u201cThere could be any number of mission-critical applications worth the protection of active\/active redundancy, the trick is determining those that merit the expense,\u201d said Steven Hill, senior storage analyst with 451 Research. \u201cIt\u2019s important to remember that a good DR\/BC plan calls for a broad assessment of a company\u2019s key business priorities; the personnel, data and applications necessary to support them; and the cost of alternative options available to replace them \u2014 all weighed in a cost\/benefit analysis against the risk of loss and likelihood of a critical business interruption.\u201dPros and consDon Foster of Commvault offers the upsides and downsides of an active\/active architecture.Pros:Fast or instant recovery \u2013 often without any downtimeDuplicates the production environment to ensure service portability regardless of the infrastructure outageOften is provided as a part of the application offering (Oracle RAC, DAG for Exchange, etc.)Easy to operate once set upCons:Expensive \u2013 Must duplicate the infrastructure and always have it running.\u00a0This makes this scenario very expensive in the cloudDoes not recover in the case of data corruption (non-infrastructure event)Requires more solutions to truly \u201cprotect\u201d a service or application from an outage of any typeDoes not provide versioned copies of dataHard to set up and maintain for a complex service or applicationDark disaster recovery is more cost effective, is typically data outage focused, and can be very complementary to the built-in active recovery services, Foster noted. The infrastructure would be highly available with data copies tracked with real-time and versioned point-in-time references to solve any outage issue that may arise.ScaleArc\u2019s CEO Justin Barney believes an assessment of the costs for an active-active architecture must take into account the potential losses of downtime. \u201cActive\/active operations do cost a bit of a premium \u2013 about 20 percent in hardware and software costs. But those additional costs don\u2019t include offsets from sources such as revenue losses averted because of avoided downtime. Overall, the perspective that active\/active operations are warranted only for organizations that can\u2019t afford downtime is true,\u201d he said.Barney said with demand for continuous availability dominating nearly every industry, active\/active operations clearly provide the best mix of advantages.There\u2019s new data showing that the backup systems and processes enterprises have relied on the most to ensure business continuity\/disaster recovery might actually be hurting not helping when it comes to preventing major outages, according to Barney. \u201cThis is important now because these disaster recovery systems are no longer meeting the needs of organizations that must achieve \u2018continuous availability.\u2019\u201d\u00a0\u201cToday's enterprises\u00a0don\u2019t have the luxury of failing and then recovering from that failure when going offline isn\u2019t an option \u2013 and so the 'dark DR' model fails them,\u201d he adds.Foster disagrees with that statement.\u00a0\u201cIf you are still operating backup and recovery and DR like it is 2005, then yes that statement may be correct, but the reality is that customers are modernizing how they execute on DR and backup as their infrastructures and architectures have matured and changed.\u00a0When they don\u2019t do this, outages can occur due to the no integrate fashion in which protection and DR decisions are made.\u201dIn addition, the primary server\u2019s normal workflow must be redirected to the secondary server, which becomes, at least temporarily, the new primary server. This redirection can require significant amounts of manual configuration, with two IT teams (one at each location) working overtime to enable and troubleshoot the switch. Similar reconfiguration applies to DNS, networking, replication topology, and other infrastructure elements. Testing requirements are massive, and additional IT staff must step into place at the secondary facility while the original IT team remains pinned down trying to get the primary facility back online.\u201cOf course, as we\u2019re watching the big trends around 'software is eating the world' and 'every company is becoming a software company,' there are fewer and fewer organizations for whom downtime is acceptable. DR often means at least several minutes, if not more, of downtime, and of course, because you\u2019re bringing an idle system online all of a sudden, it may not start operations smoothly. But yes \u2013 active\/active architectures are best suited for organizations that cannot tolerate downtime,\u201d Barney said.Joseph George, vice president of product management at Sungard AS, said he wouldn\u2019t frame the debate between the two architectures purely in terms of efficiency, because often the biggest deciding factor for what resiliency tier a business selects is based on what companies can afford. \u201cClearly, if cost was not a factor, every business would have [high-availability] systems. But they typically can only afford (and need) that level for the most mission critical systems and applications,\u201d he said.\u201cIt is important for enterprises to 'tier' their applications to help manage the economic balance between risk and the investment to mitigate. Tiering applications, as well as mapping their interdependencies, enables optimal recovery order sequencing and allows for the most cost effective availability program for the level of application downtime and data loss the organization can afford based on business impact,\u201d he added.Warm DR is fineSwike said the majority of enterprises don\u2019t really need active\/active DR. Warm DR meets their needs. With appropriate bandwidth between sites, an RPO of seconds and technical RTO of minutes-to-hours is very achievable. \u201cThe technology is only part of the story though: there has to be discipline and time given to the process of DR. Having servers replicated is a great step, but if you don\u2019t test it regularly how would you know it\u2019s even going to work?\u201dFor many, DR is number 11 on their top 10 list of priorities, she said. \u201cThat in no way means that they don\u2019t care about DR. It\u2019s just that day-to-day issues and production projects always tend to be at the top of the list.\u201dMike Weber, vice president of Coalfire\u2019s labs division said, fundamentally the key to a solid backup strategy is dependent on the business needs and mission criticality of the system. There are many tiered models that speak to critical data with a very short RTO measured in minutes that require streaming backups and\/or replication to a redundant (but not high availability) system, through the non-critical data that can absorb the impact of recovery measured in days.\u201cEach of these, and various levels in between, requires different strategies to meet both business continuity and disaster recovery objectives. There are dozens of ways to proverbially skin that cat,\u201d Weber said.He said many times Coalfire finds that backup or disaster recovery sites do not have the same security protections and controls that production sites do. Penetration tests have found that when there are systems that are used in various backup or redundancy capacities, budget constraints often result in a lack of the same network security controls that protect the production environment.