Musings on IT, data management, whitewater rafting, and backpacking

Thursday, February 25, 2010

Massive Disk Failure, Part 4

Our disk array vendor exercised and tested both the old array and the failed drives for three weeks.

No trouble found.

From what I've read, this outcome is not unusual. Way back in December, we power-cycled the array, and removed-reinserted the drives trying to clear these problems.

Remember the problem with our oldest Thumper? ZFS rebuild proceeded without drama, we swapped the bad drive for a spare, and life is back to normal. Maybe we got lucky in that case, and did not experience correlated drive failures. Or maybe luck had nothing to do with it, and Sun qualifies their drives better than our other array vendor.

We are actively pursuing a higher speed WAN line, like the offer we got from Comcast for 1 Gbps. If we can get a link that fast, mirroring between sites becomes thinkable for our ~200 TB pile of storage. Then, if the primary system fails, we can continue operating (with lower performance), until we rebuild.

I feel boxed in by the poor choices available from the IT industry:

- Pay through the nose for "enterprise" class storage (like 10x to 20x other solutions), but not sure we get 10x to 20x better reliability and service.

- Suffer potentially catastrophic disk failures and extended recovery times, using commodity storage, because drive manufacturers have pursued capacity increases without corresponding reliability and I/O speed increases.

- Pay through the nose for WAN bandwidth, unless we happen to have competitive choices for WAN providers. I don't get to choose our location, so relocating to a major metropolitan area with fiber sprouting from every manhole is not an option.

We can't pay through the nose for much of anything. My IT budget has been flat or declining since I took this job, but the scientists we support have gone from generating 100 MB per day in the mid-1990s, to 10 TB per day now. Add on weekly unfunded mandates from headquarters, and there are years where I have no discretionary budget.

So what's the solution for large, reliable storage on a shoestring?

I know, I know - pick any two (or one): cheap, big, or reliable. The people I support don't understand those tradeoffs. They are content to put all their data on $90 1 TB external hard drives they picked up at Best Buy on their way into work, and don't understand why I need to spend 10x that to get reliable storage.

Sometimes I don't understand, either.

No comments:

Post a Comment