Musings on IT, data management, whitewater rafting, and backpacking

Tuesday, November 16, 2010

Multi-level storage on the cheap

Many vendors sell software or hardware to move your data automagically between different storage tiers. They charge a lot for this magic. Multiple storage tiers are almost synonymous with these expensive solutions.

We can get most of the benefits by using our brains instead, for far less money.

Saturday, November 6, 2010

Rocky's Hardtack Recipe

I really miss a bread-like food while backpacking, especially on longer trips. I tried many off-the-shelf bread, cracker, and tortilla variations, but they either tasted terrible or didn't last long in my pack or in storage.

Sometimes I would see references to hardtack, but hardtack is hard to find in local stores. I found a few recipes online, tried them, changed them, and came up with mine.

Don't let the name scare you, these aren't that hard (more like hard cookies), they taste pretty good, and they are easy to make for baking novices like me.

Lightweight backpacking in the early 1980s

Some people write as if the lightweight backpacking movement that started in the late 1990s was never tried before, and that all backpacking before that focused on comfort at the expense of weight.


Wednesday, October 27, 2010

Thin clients on a separate network?

Should we give everyone a thin client on a separate network for accessing sensitive applications?

Tuesday, October 26, 2010

Massive Disk Failure Deja Vu, Part 2

So far, no further problems with this system.

Some corrections to the events reported in Massive Disk Failure Deja Vu, and followups on what we can or can't do to recover faster.

Monday, October 25, 2010

A different angle on RTO, RPO, Backups and Restores

When people design IT backup and restore processes, they typically focus on Recovery Point Objective (RPO), and Recovery Time Objective (RTO), with the goal of reducing both of those as much as the organization can afford.

Two hidden assumptions:
  • You instantly know when you have a problem.
  • You instantly initiate recovery.
Those assumptions are not always true!

Friday, October 22, 2010

Massive Disk Failure Deja Vu

Last year, we suffered through extended downtime and data loss on our primary server due to multiple disk failures on a large ZFS RAID-Z2 array. The disk array vendor found no trouble in the array or drives, and my confidence in ZFS was badly shaken. If you need to refresh your memory: Part 1, Part 2, Part 3, Part 4.

It almost happened again a few weeks ago, with some bizzare new twists.

Thursday, October 21, 2010

Love OpenDNS!

I've been using free OpenDNS Basic at home for many months now.

We recently switched our Comcast visitors network at work to OpenDNS FamilyShield to enhance visitor security and reduce our liability for visitor misbehavior.

Sunday, October 3, 2010

Even More on new Data Center HVAC

After another meeting with our HVAC subcontractor, I better understand the HVAC design for our new data center.

We will have two integrated systems; each system will have a blower added so we can move the CFM we require.

This design is even simpler than what I previously understood.

IPv6 transition ordered

We've been ordered to transition to dual-stack IPv6 plus IPv4 down to the client level in less than 4 years.

This will be a financial, operational, and security disaster.

Data Center Consolidation Gets Worse

Our data center consolidation goals were finally revealed to us - four months after the decision was made.

Current:  350 "data centers" (3 or more servers)

Goal: 5 data centers

Impossible.  Just impossible.

Friday, August 20, 2010

Data Center Consolidation?

The Powers That Be have decided we have too many data centers, and we must consolidate. No doubt they will set a goal like 50% reduction, pulled out of thin air or from an airline's in-flight magazine.

Let's consider some of the hard problems with data center consolidation:

More on our new Data Center HVAC

I had more time to digest the HVAC proposal for our new data center and wanted to explain the new design.

Wednesday, August 18, 2010

100 mile Backpack Trip in Santa Cruz Mountains

You can backpack for 10 days in a loop in the Santa Cruz Mountains, between Silicon Valley and the Pacific Ocean, hiking 8-12 miles per day. This trip is roughly 100 miles, depending on which trails you take, and which maps and signs you believe!

August 6, 2014: Portola Redwoods State Park has no piped drinking water until further notice. Due to severe drought, you should check all water sources before depending on them.

Trip Overview
Click to Enlarge
Image from Google Earth

Tuesday, August 17, 2010

Comcast update

In my first Comcast post, I was surprised at the range and price/performance of Comcast's offerings in our location, and thought we might take advantage of some of them.

Time for an update:

Staying warm on the river

On a guides mailing list we were asked for tips on keeping paddler guests warm late in the day.

Here are some of mine:

Thursday, July 8, 2010

Fiber to the desktop? Not anytime soon

For decades, some vendors have pushed fiber to the desktop, instead of copper.

The advertised advantages of fiber have been:
  1. Fiber is "future proof" – you don't need to rip and replace every 10 years
  2. Fiber has higher bandwidth
  3. Fiber is immune to RFI
  4. Fiber is almost impossible to tap
  5. Fiber can run longer distances easily
In reality:

Sunday, July 4, 2010

Design choices for new space

Made several important design choices for our new space over the past few weeks. Some of them surprised me.

Small green data center designs

I had a very long, hard road designing my new, small data center. All the visible activity is in large data centers.

Ironically, if cloud computing really takes off, we could see lots of small data centers, with fat pipes to clouds.

Some tasks won't move to the cloud, ever. We'll need small data centers for those tasks.

Would be nice if someone created guidelines for designing small, green, data centers.

If I had a few months uninterrupted, I would write those guidelines, maybe as a Wiki so everyone could add knowledge.

Long Time Gone

I know, everyone has lame excuses for not posting blog updates for months.

In my case, I really did get really sick and almost die -- twice.

Sorry, no sordid details. I'd like to maintain some privacy.

Thursday, February 25, 2010

Massive Disk Failure, Part 4

Our disk array vendor exercised and tested both the old array and the failed drives for three weeks.

No trouble found.

Saturday, January 16, 2010

Simple Change Management system pays big dividends

A few years ago, we implemented a simple change management system within our IT support group. We had seen problems with mismanaged changes, but we didn't need some elaborate expensive scheme complete with a Change Control Board as advocated by Big IT schemes.

How does it work?

Wednesday, January 13, 2010

Focus on the big picture

One of the new fallacies of the Googleized Intertubes is that anyone can become an expert on a topic with enough searching and reading web pages and listening to podcasts and watching videos.

Saturday, January 9, 2010

"Call and response" on the river

I've taught hundreds of boats full of whitewater paddlers how to paddle. The basic paddle commands for Class III whitewater are:

Massive Disk Failure, Part 3

We've had a pretty good relationship over 10 years with the disk drive chassis vendor at the heart of this problem. We thought we had our bases covered with their extended warranty promising "overnight advanced replacement" of failed components.

Not quite. Try 2+ weeks to get the chassis replaced.

The problems cited by the vendor:
  • All of this happened in late December, and many of their staff were on "vacation" (or furloughed, we're not quite sure).
  • They did not have our particular chassis sitting on the shelf ready to ship. They had to build one.
  • After building our chassis, they had to "qualify" the chassis to make sure it worked right.
  • The chassis was ready to ship New Year's Eve. Oops. Chassis actually shipped on the Monday after.
Not much we can do about this. Our vendor let us down. Realistically, all we can do is not buy from them in the future.

Massive Disk Failure, Part 2

Most of the dust has settled, though we are still waiting for root cause analysis from our drive/chassis vendor. They found nothing wrong with the first two drives that failed, but apparently this is a common finding.

Recovering from backup tapes was far more painful than anticipated. Despite weekly backups to tape, several people lost 2-3 weeks of work for various reasons: