Musings on IT, data management, whitewater rafting, and backpacking

Tuesday, November 16, 2010

Multi-level storage on the cheap

Many vendors sell software or hardware to move your data automagically between different storage tiers. They charge a lot for this magic. Multiple storage tiers are almost synonymous with these expensive solutions.

We can get most of the benefits by using our brains instead, for far less money.

Virtually all of our server storage is files added and removed by real people, rather than databases automatically filling with new entries.

People are smart, and understand speed and limits – let users manage the storage tiers.

Imagine a server where users have several home directories:
  1. /fastest/username
  2. /fast/username
  3. /slow/username
  4. /slowest/username
  1. /home/username/   [fastest storage tier]
  2. /home/username/slow
  3. /home/username/slower
  4. /home/username/slowest
Slower tiers would have more capacity and larger disk quotas.

Users will naturally fill the fastest tier first, but as that fills, they will be forced to move their files to the slower tiers.

Each tier would be implemented with different speeds, costs, and technology.

Today, those tiers might look like:
  1. Fastest – PCIe Flash drives
  2. Fast – SAS MLC SSDs
  3. Slow – 7200 RPM SATA RAID or ZFS filesystems
  4. Slowest – Network storage to file servers
Other technologies could be used, like SSD-enhanced ZFS, Drobo storage products, or cloud storage. Each will have different cost-capacity-speed tradeoffs.

We must apply quotas at each tier to ensure some fairness.

We could expand storage at each tier as budget allows.

We could help users move stale files to slower tiers by sending periodic reports on the least-recently-used files at each tier.

We could use a similar strategy for web servers and other servers managed by sys admins. SAs can use their brains to optimize storage, especially if they have tools to understand I/O bottlenecks. But rules of thumb might be sufficient (and cheaper), e.g. put applications and swap on the fastest tier, put most-frequently-used data on the next tier, and put redundant copies on the slowest tier.

No comments:

Post a Comment