Musings on IT, data management, whitewater rafting, and backpacking

Monday, March 21, 2011

Post-RAID storage designs

Large RAID arrays are a pain. Maybe we're better off without them.

Here are some preliminary thoughts on a post-RAID storage design that meets our needs.

We've had big problems with RAID arrays, including multi-week rebuilds, and multi-week  tape restores.

Now I'm designing a server refresh, and taking a fresh look at storage.

We need:
  • 600 GB of ultra-fast storage 
  • 9 TB of fast storage 
  • 50 TB of slow, reliable storage. 
My initial designs specified:
  • 600 GB PCIe Flash card for ultra-fast storage. 
  • 600 GB, 15K RPM, SAS drives with RAID 0 for fast storage. 
  • 2 TB, 7.2K RPM, SAS drives with RAID 6 for slow storage. 
Speed isn't everything, if your system is down for weeks recovering from  RAID failures. We also want to recover quickly from disk failures, and from corrupted or accidentally deleted files.

So how about this design instead?
  • 600 GB PCIe Flash card for ultra-fast storage.
  • 600 GB, 15K RPM, SAS drives as JBOD for fast storage.
  • 2 TB, 7.2K RPM, SAS drives as JBOD for slow storage.
  • More 2 TB, 7.2K RPM, SAS drives as JBOD mirrors of each of the above
  • Daily rsync mirror of the ultra-fast storage to a 600 GB mirror drive partition.
  • Daily rsync mirror of each fast disk to a 600 GB mirror drive partition.
  • Daily rsync mirror of each slow disk to a mirror drive.
  • Run the rsync jobs at about 6 pm, Monday through Friday only.
This design needs more slow disks than the first design, but we gain several from dropping RAID 6.
     What happens if a disk fails?
    1. We need to recognize the problem before the next rsync job runs
    2. Halt that particular rsync job
    3. Redirect to the mirror partition or drive, possibly with reduced performance
    4. Schedule down time to power cycle the failed drive, and replace if needed
    5. Copy the mirror partition or drive back to the original drive
    6. Redirect back to the original drive
    7. Restart the rsync job
    Step 5 is where we save a lot of time compared to rebuilding or restoring a large failed RAID array. All the other steps are identical to RAID failure, or don't take much time (e.g. halting and restarting rsync jobs).

    Even if we needed to recover one disk from a backup tape, that will take much less time than recovering a large RAID array from many backup tapes.

    We would have similar steps for recovering a corrupted or deleted file. Ideally, we would have daily ZFS snapshots, but that has other issues.

    Why the specific rsync days and times?
    • Virtually all of our work is done during normal working hours, 8 am to 6 pm Monday through Friday.
    • We typically recognize and begin restoration procedures only during normal working hours.
    • We don't want our mirrors to reflect corrupted disks or files before we get a chance to recognize and restore.
    That means we don't want rsync jobs running on weekends, we don't want rsync jobs running after long periods of low file activity, and we really don't want RAID 1 which instantly mirrors problems to the other disk!

    This design also reduces our risk from total RAID array recovery failure; i.e. everything goes wrong and we are unable to recover any data on a large RAID array. With this design, a double or triple disk failure only loses the data on those disks, rather than the entire array. And others have observe correlated or cascading disk failures due to identical designs, same manufacturing batch, and identical operating environments.

    I'll have to think through all the implications of this design for a while.

    No comments:

    Post a Comment