• United States



by James Geis

Recovery: The Only Reason for Backup

Feb 04, 20058 mins
CSO and CISOData and Information Security

While recovery may not be a new reason to backup information, it remains the only reason for backing up information. Unless information can be recovered with pinpoint accuracy within a certain time frame, the whole backup paradigm is useless and an organization could suffer significantly. Data corruption, physical disaster, media disintegration, and compliance regulations are just a few reasons that organizations are looking for new ways to improve data protection and the associated operational processes. Organizations can insure physical assets from most natural disasters, but they cannot insure the most important asset: information and the informations integrity.

With regulation complexities, the desire to rapidly recover, advancing technology, and vendor fear, uncertainty, and doubt, how can organizations address the challenges of recoverability with cost effectiveness and operational efficiency? What policies, practices, and audit trail processes should an organization have in place to ensure recovery is possible? Where do backup, replication, archiving, and disaster recovery merge and diverge? Proper planning and regular testing of an organizations recovery plan is essential to maintaining effective business continuity. Having answers to these questions via an effective plan not only brings confidence to the system administrator, but to the entire organization that participates in the protection of information.

Backup and recovery requirements have changed due to compliance regulations, increasing amounts of information, and the operational challenges associated with protecting large heterogeneous storage pools with fewer human and computing resources. The good news is that the maturation and merging of backup software and data replication, along with application integration and technologies such as nearline, networked storage, and archiving, has created an environment that enables innovative and streamlined solutions to achieve shorter backup and quicker recovery windows. The not-so-good news is that just because information has been copied to an alternate medium doesnt mean that it can be restored or that it is valid. Organizations need to consider every recovery scenario. If an organization had to perform a partial or full recovery of data, a specific application, system, or site, could it be prepared for each circumstance?

Information restoration is only a single element of an effective disaster recovery and business continuity plan. Its important not to confuse restore and recovery. Restore is the copying of information from backup to primary storage; recovery is collection of other processes that ensure the information is in a useable state to resume production. Information is useless unless recovery point and time objectives can be met with valid data. Those objectives are, in turn, reliant upon detailed knowledge of the information as it relates to the application and all elements surrounding its use. Backup, restore, and recovery are strategic, coordinated organizational events, not just a tactical system administrator function. These three terms mean different things to a database administrator, system administrator, application owner, and end user. Therefore, testing a full-scale restore must be coordinated with the measures, metrics, and verification that recovered information is valid and in the state that is expected. When a database is quiescent for example, what benchmark or statistic is captured at the time of backup to know that if you recover to an exact point, you succeeded?

Policy continues to be a challenge for many organizations, but it is necessary to provide clear and concise guidelines for circumstances that can often have vague and ambiguous directives. With relation to the legal system and compliance, courts have been much more lenient on organizations with active policies, detailed audits for compliance, solid documentation, and follow-through. Compliance regulations can be generally summarized to include requirements for the availability, authenticity, accessibility, integrity, security, and the documentation and audit process to ensure processes and procedures are in place to do all of the above. All of these rely on an organizations ability to recover information.

In relation to backup and recovery, information policy should include provisions and direction for, but not limited to:

  • Change control management
  • SLAs for information availability
  • Backup frequency, schedules, and off/online access to all types of information classified by importance
  • Recovery point and time objectives in association with business continuity planning and disaster recovery
  • Tape retention, recycling, offsite, and rotation schedules
  • Data replication (in and out of storage frame) for disk and tape how many copies are necessary?
  • Treatment of production, test, and development data for backup, restore, and order of recovery in disaster
  • Guidelines for the checkpoints, schedule, and treatment of all information mission critical, disposable, electronic messages, user data, home directories, etc.
  • Guidelines about what information is saved and why an .mp3 or .wmv may be important to a media company, but not to a financial services company
  • Security of information copies (physical and logical access)
  • Rules for the data retention and archival process How long is information retained, and on what medium? What is the access time and what information gets moved from primary to secondary or tertiary storage, and why?
  • How do primary, secondary, and tertiary storage get protected on what medium and at what frequency?
  • Owner of the processes and practices for backup and recovery its not just the system administrator, it must involve the application or information owner
  • Audit trail to ensure policies, processes, and practices are adhered to and maintained

A consistent information challenge for organizations is availability. Advancements in both hardware and software technology must be supported with solid process. With the acceptance of nearline storage, in-frame and out-of-frame data replication coupled with long- and short-distance replication, it is becoming a more realistic alternative for organizations to consider having full backups available on disk. Replication allows multiple point-in-time copies of information that can be directly accessible or rapidly restored. At the very least, using nearline pools as a staging location prior to tape seems logical.

Many organizations deliberate about full or incremental backups. Should an organization perform full backups on a daily basis or should they baseline a full backup and perform incremental backups? While both are acceptable, both have advantages and disadvantages with longer time frames depending on whether the problem is backup or restore. In either circumstance, an organization must ensure that both processes are solid. Readily-available full backup copies are desirable in most circumstances and the operational ease and cost is making the justification for this solution. Even remote copying to alternate sites is becoming easier with remote vaulting of tapes and synthetic backup and restores. Many organizations are considering multiple alternatives because having only one copy of information on a tape still carries significant risk; many organizations duplicate tapes for this exact reason. But remember, what happens if the backup itself is not good? While tape is not dead, its price point is being approached by some disk alternatives, and offsite archiving is also an option.

Open systems are taking a lesson from 1970s mainframe ideals by introducing virtual tape. Virtual tapes are almost analogous to nearline disk as they can be used as a staging location before tape copy. While this may seem like extra process steps, keep in mind that rapid restore is the goal.

When an organization develops any processes, regardless of the technologies, they should take into consideration the following:

  • Develop a comprehensive plan for addressing all backup and recovery facets. Identify key participants and define those roles. Document processes and procedures, assign ownership, and continuously validate that they work.
  • Understand the contingency for every possible human and non-human event.
  • What data needs to be backed up frequently?
  • Determine the RPO and RTO for each class of data.
  • What data should be replicated (local and remote) for business continuity or rapid restore?
  • When determining architecture and alternatives, consider distance to alternate sites and factor that into restore (retrieving data over a network or via tapes being transported).
  • Identify key participants in the recovery process. What is the communication plan, both internally and externally?
  • When information is archived to tape, nearline, or content addressable storage, how is that pool protected? How do its backup and recovery issues differ from production?
  • Make architectural and purchasing decisions on technology that will serve the purpose for the long run and provide scalability. Tried and true technology, and even some newer technology that fits into existing architectures, can decrease cost over the long run and increase operational efficiency.
  • Is the cost of having a rapid restore from disk greater or less than the productivity time and revenue lost while waiting for a tape restore?

Unless information restoration and recovery has been tested and the process validated, an organization doesnt have an effective backup plan. Generally, restores are done at the individual element level, and a full-scale application-level restore is an operational process that involves many individuals, processes, and tools to get information back to the state in which an organization needs to resume operations. Many organizations have not performed a large scale recovery, and are hoping they will never be faced with that task. However, the risk associated with apathy has a cost much greater than the cost of the technology.

With backup and data replication merging, nearline storage coming to a price point that is more attractive, and the advancement of archival tools, organizations should integrate these elements into an IT strategy that will advance recovery objectives. By focusing on optimizing assets and personnel, and being fully prepared for information recovery, an organization has a greater chance of recovering information that has integrity.

James E. Geis is director, storage solutions for Forsythe. Geis developed Forsythes unique information management framework the roadmap Forsythe uses for information and storage consulting engagements.