RAID Systems: A Description and Analysis of Common RAID Types and Their Functions.
If used correctly, disk drive arrays may provide several advantages over the single drive: higher reliability and higher data transfer rate.
A simple replacement of one drive by a group of drives will not increase reliability since the life of the entire system will depend on any of this drives. In fact, reliability (mean time before failure or MTBF) will decrease with increasing number of drives since the probability for one of them to fail will grow. This is why a certain level of redundancy is needed in the design of drive array to increase reliability of the entire storage system.
RAID is an assembly of disk drives, known as disk array, that operates as one storage unite. In general, the drives could be any storage system with random data access, such as magnetic hard drives, optical storage, magnetic tapes, etc. When the speed (data transfer rate) is an issue, the fastest SCSI hard drives are typically used.
Redundant Array of Independent Disks (RAID) technology serves the following functions:
- Immediate availability of data and, depending on the RAID level, recovery of lost data.
- Redundancy of data at a chosen level.
Depending on the level of RAID that you are using, this method of data storage provides the data redundancy needed for a highly secure system, with additional benefit of faster retrieval of data through multiple channel access. If one or a few disk drives fail, they can be normally exchanged without interruption of normal system operation. Thus, disk arrays can ensure that no data is lost if one disk drive in the array fails.
The array includes drives, controllers, enclosure, power supplies, fans, cables, etc. and software. Each array is addressed by the host computer as one drive. There are several types of RAID configuration, called levels, which control the ways of organizing data on the drives and organizing the flow of data to and from the host computer.
In 1993, the RAID Advisory Board (RAB) established the RAID Level Conformance program, which closely followed initial classification found in early UC Berkley work.
There were 6 RAID levels defined.
The numbers used to describe the RAID levels do not imply improves performance, complexity, or reliability.
- RAID-0 is Striping RAID. It does not, by itself, contribute to EDAP and provides NO redundancy, since if one drive fails, all the data in the array will be lost.
- Two types of RAID provide EDAP for the drives: Mirroring and Parity.
Mirroring appeared earlier (in the UC Berkeley Papers) and was originally designated as RAID Level 1. Its main problem is that it requires 100% redundancy or twice as much capacity as was originally needed. On the other hand, its read performance is improved and a higher percentage of drives in a RAID-1 system may fail simultaneously as compared to a Parity RAID system.
- RAID Level 2 requires the use of non-standard disk drives and is therefore not commercially viable.
- Parity RAID was identified in the UC Berkeley Papers as RAID Levels 3, 4, 5 and 6.
It RAID significantly reduces redundancy overhead to a range of 10% to 33% (compare to 100% for Mirroring RAID-1). RAID Levels 3-5 provide EDAP in case of one disk failure, and RAID-6 tolerates failure of two disks at the same time or if the second disk fails later, during the reconstruction period.
Levels 1, 3, and 5 are the most commonly used RAID levels and are discussed in the next table. Lets consider a number of drives, say, N drives, connected to the server with MTBF equal to M, and with mean time to repair R.
- Provides NO redundancy, since the data are written across multiple drives (so-called stripping). If one drive fails, all the data in the array will be lost.
- Provides higher data rates, since all drives are accessed in parallel.
- Data mirroring. High reliability. The same data is written or read on two (or more) drives.
- Faster reading, since the first drive to respond to a request will provide data, thus reducing latency.
- The coast at least doubles for a given storage capacity.
- * MTBF ~ 2M + M2/R (see legend below)
- One extra drive is added to store the parity data (error correction data). If one drive fails, the data can be recovered and the other drives will keep working till the failed one is replaced (of cause, performance will suffer).
- High reliability (cheaper than mirroring in RAID-1).
- Very high data rates. Data writing and reading occurs in parallel.
- For a given capacity, fewer drives are needed than for RAID-1.
- Controller may be more complex and expensive.
- Data and parity information stripping across all drives.
- High reliability, high performance
* Lets consider a number of drives, say, N drives, connected to the server with MTBF equal to M, and with mean time to repair R.
RAID-0 is inappropriate when data availability (reliability) is an issue, since it provides no data redundancy and failure tolerance. Data mirroring (RAID-1) or parity check (RAID-3 to RAID-5) are needed in this case, leaving. RAID-5 is recommendable because of its combination of high data availability and good performance. In case of drive failure, even if the data is still physically available, it may be inaccessible within the needed period due to a drastic drop in system's performance. Performance drop of, say, 50% is not totally unusual during reconstruction of the system.
RAID Performance Characteristics
Large Data Transfers
High I/O Rate
10.000 to 1,000.000 hours
* Availability is equal to MTBF of one disk divided by the number of disks in the array.
With the falling price of the hard drives, RAID systems become affordable enough for home use also.
RAID-0 and RAID-1 are the simplest to integrate then the other, more complex systems.
A hot-spare drive is a special drive that is designated for automatic use if any drive within an array fails.
The hot-spare has a storage capacity greater than or equal to that of the smallest drive in an array. It is possible to define as many hot spares as you want. A RAID-1 array on an adapter can use the hot-spare disk drives on that adapter. If a drive within an array fails, the adapter will automatically engage a hot spare instead of the failed disk drive, and rebuilds the data that was on the failed disk on to the hot spare.
Extended Data Availability & Protection (EDAP)
EDAP is the ability of a disk system to provide timely, continuous, on-line access to reliable data under certain specified abnormal conditions. These conditions, as described by RAB, include (this is the RAB's exact description):
Failures within the disk system
Failures of equipment attached to the disk system including host I/O buses and host computers
Failures resulting from abnormal environmental conditions, including:
External power source out of operating range, temperature out of operating range, natural disasters such as floods and earthquakes, accidental disasters such as fires and unlawful acts such as sabotage, arson, terrorism, etc.
Replacement periods are the intervals required for replacement of a failed component. If "hot swap" is not supported by the disk system, then the component replacement period is tantamount to disk system down time. If "hot swap" is supported, then down time due to a replacement period is eliminated; however, until the failed component is replaced, the disk system is in a vulnerable period.
Vulnerable periods occur when the disk system has invoked its ability to circumvent a failure, rendering the system vulnerable to additional failures and causing the system to operate at something less than optimum performance until the fault is corrected.
In 1996, RAB introduced an improved classification of the RAID systems. It divides RAID into three types:
- Failure-resistant disk systems (that protect against data loss due to disk failure)
- Failure-tolerant disk systems (that protect against loss of data access due to failure of any single component)
- Disaster-tolerant disk systems (that consist of two or more independent zones, either of which provides access to stored data).
The original "Berkley" RAID classification is still kept as an important historical reference point and it also to recognize that RAID Levels 0-6 successfully define all known data mapping and protection schemes for disk.
Unfortunately, the original classification caused some confusion due to assumption that higher RAID level implies higher redundancy and performance. These confusion was exploited by RAID system manufacturers, and gave birth to the products with such names as RAID-7, RAID-10, RAID-30, RAID-S, etc.). The new system describes the data availability characteristics of the RAID system rather then the details of its implementation.
The next list provides criteria for all three classes of RAID:
Failure-resistant disk systems (meets criteria 1 to 6 minimum):
- Protection against data loss and loss of access to data due to disk drive failure
- Reconstruction of failed drive content to a replacement drive
- Protection against data loss due to a "write hole"
- Protection against data loss due to host and host I/O bus failure
- Protection against data loss due to replaceable unit failure
- Replaceable unit monitoring and failure indication
Failure-tolerant disk systems (meets criteria 1 to 15 minimum):
- Disk automatic swap and hot swap
- Protection against data loss due to cache failure
- Protection against data loss due to external power failure
- Protection against data loss due to a temperature out of operating range
- Replaceable unit and environmental failure warning
- Protection against loss of access to data due to device channel failure
- Protection against loss of access to data due to controller module failure
- Protection against loss of access to data due to cache failure
- Protection against loss of access to data due to power supply failure
Failure-tolerant disk systems (meets criteria 1 to 15 minimum):
- Protection against loss of access to data due to host and host I/O bus failure
- Protection against loss of access to data due to external power failure
- Protection against loss of access to data due to component replacement
- Protection against loss of of data and loss of access to data due to multiple disk failure
- Protection against loss of access to data due to zone failure
- Long-distance protection against loss of data due to zone failure
Article provided compliments of www.usbyte.com.