Does storage have to be a headache?

One of the biggest challenges we face at work is managing storage. While we don’t run the day-to-day operations of, say, the EMC gear, we are responsible for engineering and maintaining some very high traffic and mission critical storage at the host level. Due to the nature of our Matrix management structure, this means that we have to be experts on SAN, filesystem, OS, and even said EMC gear. Since we also provide storage for databases and “generic” network filesystems, the team has to be aware of the limitations and requirements of both of those. Throw in an extreme sensitivity to cost, and you have a real engineering challenge.

On the the network file side, we’re currently using NFS, warts and all. With high traffic and high use – like, say, SVN or CVS with active development and testing on an NFS mount – things get hairy. File locking seems to be an issue with high numbers of users and/or traffic on Linux as opposed to Sun. Keeping very large filesystems across multiple sites in sync is also a lot of work, even with the best Veritas has to offer.

The filesystems and their hosting servers require 5 9s of availability. Another wrench – high availability usually means a cluster solution. Most cluster solutions are not easy to manage. As you add more levels of complexity to a system, you have the paradox of more moving parts making your HA solution more delicate.

So, what to do? Right now, I am thinking of using some new technologies – and some not so new – and move away from the “We’ve always done it this way”. We need to create a new framework, and then use the technologies to lay upon that framework. I also want it to be architecturally simple.

Despite the silly name, the GSX 3000 by Yotta Yotta offers a very slick feature set, especially in a high-availability, clustered environment. The ability to abstract storage from the hosts is the first step. One way to think of it is a RAID array at the cluster level. For example, in a server the hardware RAID card abstracts the management of the disks from the OS. That is what I am trying to accomplish here – a kind of RAID 5 at the server level.

Since we’re a Red Hat shop, using GFS may be our next step. While we have used Veritas since before my arrival, the level of performance and trouble it has caused may not be worth continued investment for this project. The sizzle of GFS is all nodes have concurrent access to the data devices. In my thinking, once that is coupled with the GSX 3000, you have a powerful framework and delivery method.

Finally, we come to AoE. I mentioned the matrix management above – I don’t know if I can get away with this politically. However, I have seen some very impressive displays of cost:performance of ATA over Ethernet. While I got some very good leads at LinuxWorld, since my team does not “own” the storage, this may be hard to push. If we can, being able to segregate lower traffic from high demand filesystems at a huge savings in cost just makes this whole project even more exciting.

Anyway, that’s where my mind is at 0300 on a Sunday morning.

  1. No trackbacks yet.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.