Last year some of you followed on the XenServer Facebook page with great interest the physical migration of my CloudPlatform demo cloud. Some even commented on how cool the storage I was using looked.
Unfortunately, as anyone who has had to deal with datacenter hardware knows all too well, servers which are running might not start back up if powered down, and this is no less true for storage controllers.
As it turned out, one of the controllers in my storage array failed, and it proved just a little bit harder to get it replaced than I had anticipated, so off I went to find a suitable replacement. Before we go too far down my decision process, it’s probably a good idea to review what the two most common storage options are in cloud, and why you might want to choose one over another.
Local storage is by far the simplest of the choices, after all most servers come with at least one disk, and you usually have the option add in several more. Typically local storage is used in an effort to control storage costs, and with decent shared storage starting in the tens of thousands of dollars, there is the potential for some savings.
Well, up until you understand IO that is. All spinning disks have spindles, and the amount of random IO you can get out of a spinning disk is a function of its rotational speed and the number of spindles it has. If you are the sole user of the disk, the number of spindles doesn’t matter too much, but as soon as you have multiple users (aka VMs), things can slow down quickly.
Of course SSD is always an option, but with enterprise SSD costing 5-10 times what the same capacity SAS 15k drive does, SSD for local storage isn’t really a cost leader. More importantly, local storage also historically came with an implicit limitation; VMs can’t readily migrate between hosts. Thankfully, the latest versions of both vSphere and XenServer effectively address this problem.
In server virtualization, shared storage is typically used to allow for more effective host utilization. If you need to start a new VM, there is no real way to predict which host in a cluster might have the free capacity, but with shared storage the host selection process can be disconnected from the storage management problem.
This is really good because anchoring the storage to a shared storage solution allows for more advanced functionality like automatically restarting VMs if the hardware should fail. Regardless of whether you use file (NFS) or block (iSCSI) based storage, the IO available to you is a function of the number of disks, their speed and how efficient the storage array is at handling those IO requests.
The problem with traditional shared storage is that controllers don’t understand the type of IO they are being asked to deliver. To them, a database query and a starting VM are pretty much the same, and that leads to a serious problem in the cloud.
How I Arrived at SolidFire
When you look at the state of the world in storage arrays, the core trend today is greater and greater IOPs. This is wonderful for the storage guys, but organizations are actually over-buying IOPs based on predictions for peak IO requirements. In the world of IaaS, this is made worse due to a lack of control over the IO demands each cloud tenant has.
Effectively, if careful storage design isn’t done, the IO usage of one account could lead to a second account becoming IO starved. SSDs offer a ton of IO, but that still doesn’t solve the core problem of IO control.
Enter the guys from SolidFire.
Yes the SolidFire Storage Solution is SSD based; which is cool. Yes it offers a ton of IOP capacity, but it goes one level further. With SolidFire, you actually specify the IOPs you need on a per LUN/volume basis, and associate it with an account. This allows some pretty granular controls, but more importantly allows you to clearly establish an SLA on the storage side, and ensure that if someone is attempting to abuse the array that the impact on other tenants is easily manageable.
As cool as that is, it’s still not the full story. I’m pretty well known for being a XenServer guy, and I’ll freely admit that one of the bigger challenges I’ve had over the years has been thick provisioning on block based storage.
Now I have nothing against NFS, and honestly do use it for some of my storage in the demo cloud, but I definitely prefer iSCSI when it comes to storage management. Here’s where the SolidFire solution really got my attention.
Under the covers, they natively perform thin provisioning, data deduplication and compression on each of the blocks; across LUNs. In real-life this means that despite the fact that I’ve requested a 20Gb disk from the cluster, I am likely to be using far less than that, and while XenServer thinks it has the full 20Gb, the cluster knows better. Since I’m running a cloud, there is a ton of commonality between my templates and deduplication is a wonderful addition.
Here’s the final, and arguably key point. Since I was replacing storage, I could have taken the easy route and just got the current version of my existing array. Nice, simple, and drop it right in. Instead, I chose to look at exactly how my cloud was being used, and see if there wasn’t a better solution in the market. My key pain points were controlling IO utilization based on unknown workloads for the next several years, and being able to ensure that I wasn’t going to run out of storage capacity any time soon. SolidFire delivered on these, and that’s why my cloud is now happily running on SSD.
SolidFire is a Citrix Ready partner in cloud solutions, and if you’d like to learn more about the solution, the Citrix Ready folks are hosting a joint webinar with SolidFire on March 13th, 2013 that anyone is invited to. Just register here: http://www.citrix.com/cms/ready/solidfire/