Understanding RAID

Item: Understanding RAID
Author: Joshua Moore

Written by Joshua Moore

June 11, 2007 | 05:25

Tags: #hard-drive #raid #redundancy

In the last few years RAID has become really quite popular. Once purely in the domain of high-end enterprise servers, today, any self respecting enthusiast motherboard had better have onboard RAID if it wants to be taken seriously. The abundance of onboard RAID controllers mean that it’s not unusual to see small arrays in today’s home computers. The reasons for this can be for increased speed, increased reliability or simply for bragging rights. After all, two (or more) disks are better than one, right?

Depending on whom you ask, RAID can stand for either Redundant Array of Inexpensive Disks or Redundant Array of Independent Disks. Technically, the former was the original name given to the use of arrays of more than one drive. The term ‘inexpensive’ was used as RAID was used as a substitute to proprietary disk solutions that, while they offered acceptable performance and fault tolerance, were prohibitively expensive.

RAID was a way to increase performance and add fault tolerance whilst using off-the-shelf disks, reducing costs greatly. There are many different types of RAID and each has their own strengths and drawbacks, no single level of RAID is ‘the best’ and it is important that one picks which RAID level best suits their particular situation.

The different types of RAID can offer a multitude of benefits, whether it’s for an oracle database being accessed by thousands of users simultaneously, for a high performance HD video workstation or simply for a home user storing photos. Each different case obviously has a different set of requirements, and a vastly different budget. Deciding on which level of RAID to use is always a balance between the pros and cons of each. The main aspects to consider are performance, redundancy and of course cost.

Some RAID levels are more focused on getting all out performance without bothering with redundancy, others provide redundancy as a foremost concern and performance can suffer accordingly. Certain types of RAID require a powerful hardware controller to give acceptable performance, resulting in high costs, whereas others can give adequate performance using a software solution. So what exactly can RAID offer?

Increased Performance

There is a limit to the rate at which data can be read from or written to a hard disk platter. Unfortunately due to the mechanical construction of hard drives this limit is considerably slower than the rate at which data travels around every other part of a computer. The platters and actuators inside a hard drive can only move so fast and are bound to mechanical constraints and tolerances that solid state storage (i.e. ram) does not suffer from.

Most implementations of RAID offer increased performance over a single disk by reading from or writing to many disks at the same time. In theory, data can be retrieved from two disks in half the time as from a single disk, from eight disks four times as fast as from two disks and etcetera. Of course in practice this is not exactly true, as RAID controller overheads and calculations for redundancy slow down the process, but whether it’s an array of two disks or two thousand disks, RAID can certainly increase performance.

Data Security

Data security is the concept that an array can suffer the complete failure of one, or sometimes many hard drives and not lose any of the data contained on the array. This is done through data redundancy, i.e. some disk capacity is sacrificed for the sake of keeping extra data. Redundancy can be provided via mirroring, that is, a duplicate copy of all of the data is held through parity information, which we’ll come to later.

Data Availability

Data availability, not to be confused with data security, is when an array can sustain a disk failure with neither data loss nor interruption to service. While this feature may not be especially important to a home user, it can be vital to a business. Sometimes it’s simply not acceptable to shut down a service, a website for instance, simply to replace a failed disk. Data availability must include data security, but data security does not necessarily constitute data availability. Hot-swapping and hot-sparing are features implemented in many RAID controllers that allow recovery of a failed disk without taking the array offline.

Increased Capacity

When you need a lot of space on a single volume, sometimes the largest hard drives available are not large enough. If you wanted to record several hours of uncompressed 1080p video even the latest and greatest 1TB drive wouldn’t be large enough to hold all of that information. An array of several 500GB disks would not only create a single ‘drive’ with sufficient space, but would also be far cheaper per GB than the flagship 1TB models. It isn’t just necessarily extreme space requirements that can benefit from this. If you need an ultra high-performance database, it makes sense to use ultra high-performance drives. Currently 2.5” 15k RPM drives are the crème de la crème of high I/O performane, and you’re going to need more than one of them as the largest available is a mere 73GB.

Before getting into the nitty-gritty about which RAID level does what, we need to familiarise ourselves with a few terms…