Data Recovery Article
Can You Really Count On Your Backups?
by Philp Jan Rothstein, FBCI
Copyright (c)2003, Rothstein Associates Inc. All Rights Reserved.
Backing up essential business data is kind of like getting enough exercise or eating the right foods - more effort seems to go into talking about it than doing it.
This article originally appeared in
INFORMATION SECURITY MAGAZINE
A manufacturing CIO had a tough time explaining to the CEO what should have been a straightforward recovery from a drive failure on a critical server supporting a prominent business line. The CEO could not seem to comprehend the CIO's awkward rationalization for losing track of the most current disaster recovery backups just when they were needed most. The CIO (who for obvious reasons shall remain anonymous) tried valiantly to explain how those backup tapes were used all the time for file restores requested by individual users and were somehow misplaced. The CIO is doing a better job managing backups for his new employer.
After the tragic Oklahoma City bombing, the area around one professional service company's offices was cordoned off for several days. Even though they suffered no direct damage, employees could not get into the office to retrieve backup tapes which were still in the computer room.
After the January, 1998 ice storm, data center recovery for one Canadian company was complicated when the power went out because the most recent, on hand backups were from the prior weekend. The most current backups were still at the impacted site - and inaccessible.
All of the data for the pharmacy system of a major, metropolitan medical center was irrevocably lost after a disk crash, when MIS staff first discovered that the backup tapes produced after every eight-hour shift for several years were of the wrong disk volume.
Preaching to the Choir?
One would expect readers of Information Security Magazine to be among the most enlightened, competent and vigorous in protecting their enterprise‚ data from disclosure, tampering or destruction. But even the most seasoned information security practitioners sometimes overlook flaws in their organizations‚ programs to prevent loss or destruction of that essential data.
Rothstein Associates Inc. conducted a telephone poll in of June of 1998 of 21 data center directors or CIOs of enterprises ranging in size from $20 million gross sales to $1 billion gross sales, client/server to large mainframe environments. These particular companies were selected because they might be considered among the more progressive or enlightened organizations practicing information security.
Each of the 21 IT executives surveyed claimed to have a comprehensive data backup program in place. That claim made these actual practices all the more disturbing:
From Whence We Came
- 5 of 21 transport backup tapes off-site the same day they are created.
- Only 6 of 21 transport backup tapes off-site daily.
- 4 of 21 transport backup tapes off-site once or twice weekly.
- 2 of 21 transport backup tapes off-site monthly.
- 9 of 21 do not store backup tapes off-site at all.
- 4 of 21 run backups not more than once a week.
- 2 of 21 employ some form of electronic vaulting for their most critical systems
- Only 1 of 21 has validated their backups by conducting and auditing full-scale restores.
- Only 2 of 21 had conducted a thorough assessment within the past two years to match backup procedures with current business requirements.
- Only 1 of the 21 made a clear distinction between operational backups, disaster recovery backups and archival backups (discussed later in this article).
- 15 of 21 routinely skipped data files or data bases which were being accessed during their backup cycles, with no process in place to ensure that every mission-critical file or data base was backed up at all, let alone in a form which could be resynchronized with other data files or processes. The obvious conclusion from this limited sampling is that data backup practices are clearly not getting the attention devoted to other aspects of information security.
Data backup technologies have come a long way since early mainframe days when open-reel tape was the only practical backup vehicle. Mass volumes of open-reel tapes have been supplanted by pocket-size media with orders of magnitude more capacity and reliability such as DAT, DLT and MAGSTAR. Of course, the volume of data to back up has grown as well, constantly stretching the limits of backup media and speed.
In early mainframe days, moving the backup tapes off-site was sometimes referred to as "CTAM" - or, "Chevy Truck Access Method," after such mainframe data access acronyms as BTAM, VTAM, VSAM or BDAM. Of course, few mainframe shops ran 24x7x365 then, and there was ample time to fill the back of that Chevy truck with case after case of those backup tapes after the typical nightly batch cycles.
Optical media, electronic vaulting (sometimes called televaulting), remote mirroring or journalling, and other enabling technologies have begun to supplement, if not supplant, magnetic media as the principal backup vehicle, particularly for high-risk, financial applications.
On the other hand, actual backup practices have fundamentally changed far less than one might think, given the level of attention to electronic vaulting and other new technologies.
In general, "CTAM" is still the most popular - and most dependable - backup method. "Mainframe people have known this for a long time." Ross Kinkade, Sales Manager for Safe, Inc., a provider of off-site storage as well as electronic vaulting services, observes "Televaulting is out there, but still not as cost-effective or convenient as tape backup. Tapes are getting smaller and more condensed. Many organizations have been wavering for a long time on televaulting, but costs for communications bandwidth, televaulting software and equipment, and technology staff to make televaulting work are not coming down nearly fast enough to be consistently competitive with magnetic tape."
Why back up data at all? Intuitively, the answer is to be able to salvage it when something goes wrong. In reality, data backups serve at least three overlapping and sometimes contradictory purposes: operational recovery, archival retention and disaster recovery. Big problems can occur in when these distinctions are ignored, as is often the case.
Different types of backups serve explicit and not necessarily compatible purposes, yet the distinction among is often blurred. As a result, the risk of backups failing when they are most desperately needed is very real.
backups are created and maintained for routine, day-to-day recovery. Operational backups are often kept on-site and are especially convenient for individual file restores. Operational backups tend to be short-term, typically retained for a few days to a few weeks.
Operational backups sometimes serve as a fallback for disaster recovery backups, although the disaster recovery program should not depend on them.
backups are intended for long-term protection of data which may be required at some time in the future for legal, regulatory, tax or business reasons. Source program code for production systems, contracts, financial data used for tax preparation purposes, customer orders, confirmation of deliveries are all examples of data which may be archived.
Archival backups tend to be retained very long-term, as much as seven to ten years. Longevity and readability of the storage medium is often a consideration, since some magnetic media begin to deteriorate in three to five years, especially under less than optimal storage conditions.
Archival backups may remain on-site, although they are generally stored off-site. Archival backups are generally of little value for disaster recovery.
backups are intended solely for recovery from a substantial disruption, ranging from loss of a single disk volume to an entire data center. Synchronization of all necessary backup volumes or files is often a concern, since it is not always feasible to back up an entire system (or group of systems) at one time.
Disaster recovery backups should always be moved off-site and secured as soon as they are created, and, if magnetic media is employed, should be stored in a climate-controlled environment comparable to the computer environment where they are created.
Will They Work?
Ironically, data backups are prone to failure just when you need them most - when you are attempting recovery.
The most common data recovery failure this author has observed occurs when the restoral process has never been tested. Recovering a crashed system is the worst time to discover the backups are not usable, or that the backup/restore software has a bug! In this author‚s personal experience, a popular, PC-based backup software package had a bug which prevented restoral of one critical file. These days, this author employs two distinct drive technologies (Digital Audio Tape, or DAT, plus Travan), two different backup software packages, and alternates devices and software to produce four different combinations of backups regularly. If any one drive, medium, or software product fails, that still leaves several data recovery options.
The second most common data recovery failure occurs when the backups stay on site, and are not accessible when they are needed; or, when the backup media cannot be located. The classic example occurs when disaster recovery backups are recalled back on-site for routine file recovery, and then are misplaced or rendered unusable for disaster recovery.
The third most common data recovery failure occurs when the data backup process had never been tailored to the business need: backups run when data files or data bases are open, and critical files are skipped; backups are not run at the most sensitive or appropriate points in the processing cycle; full-disk-volume backups are run when application-specific backups of an entire application‚s files would be more appropriate, or vice versa; or, data backups for critical business functions simply are not run at all.
One company suffered considerable loss and impact because one disgruntled technician had access to both the primary, production data and all of the backups, and essentially held all of the firm‚s data hostage. Kinkade admonishes, "don‚t trust any one individual with access to production and to backup data. Use separate passwords, limit authorizations for access to off-site backups, have your auditors check for opportunities where one person could maliciously or carelessly wipe out both your live system and your ability to recover vital data from backups."
Kinkade also suggests that a bonded, insured, audited third-party media retention facility be contracted to store critical backups. Safe deposit boxes, employees‚ homes, other company-controlled locations or records retention centers (which do not typically offer separate, secured climate-controlled environments for magnetic media) cannot offer the combination of security, rapid accessibility and control. Media retention facilities also offer rapid access and delivery of backup media in the event of a recovery.
From individual PC to mainframe environments, there are a broad range of backup options and technologies available. Many of these options are leveraged for both performance and capacity with hardware- or software-driven data compression processes.
The most popular tape backup options for individual workstations utilize TRAVAN or QIC, quarter-inch tape cartridges. These offer modest device costs (in the $100-300 range), although media cost may be a little higher than other media such as DAT ($20 - 40 per cartridge). Tape capacities range as high as ten gigabytes, although backup speed is not nearly as fast as other tape technologies. Parallel-port, plug-in models are available, offering an especially cost-effective approach for low-volume, multiple-workstations or laptop computers.
Removable-media disk devices such as Iomega‚s Zip² and Jaz² drives, Syquest‚s SparQ and SyJet 1.5gb, Imation‚s LS-120 among others, are increasingly popular for small-office, home-office, telecommuting, standalone and mobile employees. These technologies offer 100 megabytes to two gigabytes per removable disk. Device costs range from as low as $100 or so up to the $700 range. Media costs range from $10 to as much as $100, making these devices less desirable forvolume backups.
CD-R (recordable) and CD-RW (rewriteable) devices are becoming more common, typically affording 650 megabytes per CD. Cost per CD is low, on the order of $1-5.
DAT, or Digital Audio Tape drives are often employed for high-end workstations or client/server environments. DAT technology offers good reliability and data integrity at a very modest media cost. 4mm and 8mm drives range in cost from around $600 to almost $2000, with jukeboxes capable of swapping multiple tapes in and out for unattended, multi-tape backup operations costing around $3-6,000. Media costs are low, ranging from $6 to $30/tape. Performance is considerably faster than quarter-inch tape.
DLT, or Digital Linear Tape, offers even higher capacities and speed than DAT, ranging as high as 40 gigabytes per tape (compressed). Costs can range from $2,500 to $10,000, including jukebox options, with media costs ranging from $40-120.
An intriguing backup option is the use of internet-based data backup services. Costs range from a flat fee on the order of ten dollars/month, on up depending on data volume and frequency. The backup service provides either optical or magnetic media storage, with appropriate security and access controls.
Electronic vaulting has received considerable press in recent years. In theory, the ability to move data to a remote site through a network seems like an ideal backup tool. In practice, according to Safe, Inc.‚s Kinkade, "... electronic vaulting technology always seems a step behind where we would like it to be. Many organizations talk about vaulting, but the only ones actively doing this are in the high-dollar environments such as banking or brokerage, where minutes lost can mean millions of dollars." Kinkade‚s rationalization for this slow trend to e-vaulting is in part that "... magnetic media costs have been consistently trending downward far faster than network bandwidth costs. Plus, e-vaulting depends on LAN administrators, whereas tape backups are not particularly labor-intensive - and, in a tight, technology labor market, the people resources to implement, debug and operate e-vaulting are a big concern."
At the high end of data backup options for some of the high-dollar-risk environments are mirroring and remote journalling. Mirrored data files or transactions are transmitted synchronously or asynchronously to a remote system which is used for disaster recovery. Options range from keystroke-level journalling up to periodic transmission of entire data bases. For mirroring or journalling to work well, hardware or cache-level dual-write functions may be employed, although some sophisticated applications software systems incorporate journalling or mirroring capabilities directly. In considering remote journalling or mirroring, Kinkade cautions, "look for open, Œplug-and-play‚ architectures. Don‚t get locked into proprietary technologies."
What Works Where?
One common failing of data backup approaches is the assumption that backups are forever. Magnetic media, even under optimum conditions, has a finite, usable life. Depending on storage and handling conditions, quality and type of media, and recording methods, magnetic media may deteriorate rapidly or be accessible as long as 7-9 years. Magnetic media, therefore, are well-suited to operational or disaster recovery, but may not be ideal for long-term, archival storage.
Optical media such as CD-R are expected to have longer usable life, on the order of a decade or more. Therefore, they are better suited to archival requirements.
One pitfall often encountered in long-term data retention is the dependence on specific software tools, devices or forgotten people skills to utilize the archival data. How many IT shops can read and process seven-track, 556-bits-per-inch, open-reel magnetic tapes, or even 5þ-inch diskettes any more? How many could read a Multimate 1.0 word processing file, or a Visicalc spreadsheet? Consider reviewing archived data at least every couple of years and transferring it to newer media as well as data formats if appropriate. A poignant, timely example of this peril is the millions of lines of source code for production applications which, after a decade or more without change, is suddenly critical to achieving year 2000 compliance.
Internet-based backup schemes work particularly well for laptop computers, especially for mobile workers, telecommuters and home- or small-office environments. Of course, bandwidth is a considerable factor, limiting this method to a few megabytes of essential files.
For individual desktop systems, the high-capacity, removable disks or Travan tape drive technologies offer convenience as well as moderate cost. Remember that the most sophisticated technology won‚t be of much value if nobody puts a disk or tape in the drive and invokes the backup program. Backup schemes which depend on an individual to proactively run a backup often fail.
Who Does What When?
Accountability for backup processing and storage is becoming a mounting frustration for many organizations. End users want to believe that IT takes care of everything - another way of saying, "when my data file gets creamed, even if I forgot to back it up, at least I can point a finger at IT." With the proliferation of desktop, laptop and decentralized server environments, it is tough to even track what to back up, let alone actually producing backups.
The most successful backup processes this author has observed in a broad variety of environments have these factors in common:
- They are highly automated and do not significantly depend on human intervention for routine operation.
- They dynamically adapt (or adapt with minimal human intervention) to platform, network, workstation and other changes.
- They monitor, report and provide an escalation and follow-up resolution structure for problems or exceptions.
- They employ built-in fail-safes, such as alternate backup devices, and have been built with a comfortable capacity and error margin.
- They are regularly audited and validated. In addition to accuracy and usability, capacity, performance and reliability are reviewed. In most networked environments, centralizing essential data on servers rather than individual workstations can make backup as well as recovery much more practical than if essential data is scattered among many workstations. On the other hand, backup software and dedicated file backup servers which scan connected workstations as well as data servers and dynamically backup modified or new data are becoming increasingly practical.
Whether centralized or not, the IT operational organization is almost always the logical area in which to place accountability for enterprise-wide data backup - if for no other reason than this is where blame will be placed (whether justified or not) in the event of a data loss elsewhere in the organization.
Selected Information Resources
Association for Information and Image Management (AIIM) - 301.587.8202
Association of Records Managers and Administrators (ARMA International) - 913.341.3808 or 800.422.2762.
Institute of Internal Auditors (IIA) www.theiia.org ; 407.830.7600.
The Rothstein Catalog On Disaster Recovery, a source for books, software, videos, research reports - www.rothstein.com ; 888-ROTHSTEINTEN DATA
- Move it. On-site backups are all but worthless in the event of physical disruptions or loss of access. The quicker the backup is transported off-site, the more usable - and accessible - it is likely to be when most needed.
- Secure it. Protect the backups at least as well as the primary systems, from loss or disclosure as well as tampering. Store the backups, especially if magnetic media, in a conditioned environment.
- Test it. There is no way to be certain backup tapes are usable short of testing, by actually attempting to recover data and systems in a test or recovery environment.
- Segregate it. Enforce separation of access and authority between production and backup data. Ideally, no individual will have access to both.
- Meet business needs. Instead of mechanically scheduling backup processing, tailor backups to the essential business processes, flows, priorities and timings. Involve end users in backup planning.
- Be realistic. Match the technology to the organization‚s capabilities. Don‚t go overboard on technology you cannot effectively control.
- Play it safe. It is doubtful anybody ever was fired for being too careful protecting critical company data, although certain types of data may need to be excluded.
- Leave out data where appropriate. In some cases, too much backup or excessively long retention can increase exposure from litigation - you may need to exclude or limit retention of e-mail, voicemail or other transient data.
- Trust nobody. Build in controls and accountabilities to protect against sabotage or malfeasance. Ideally, invite internal audit or other non-IT professionals such as records management or risk management to review and critique the backup process.
- Keep it simple. Concentrate on protection, reliability and usability instead of getting caught up in sizzling technology. And - the most important rule -
- No backup process is going to work if you don‚t actually use it. The most sophisticated data backup system cannot implement or run itself unless someone takes the lead.
Philip Jan Rothstein, FBCI, is President of Rothstein Associates, Inc., a management consultancy emphasizing business continuity. He is publisher of The Rothstein Catalog On Disaster Recovery and editor of the book Disaster Recovery Testing: Exercising Your Contingency Plan. He was elected Fellow, Business Continuity Institute in 1994 in recognition of his substantial contributions to the industry.
Copyright (c)2003, Rothstein Associates Inc. All Rights Reserved.