In my last blog I talked about why NFS was “recommended” and mentioned that I’d talk about LUN sizing with block-based protocols in a future blog…well this is it! Have you ever wondered why organizations create LUNs a certain size? Have you ever wondered why some organizations put a single Virtual Disk Image (VDI) on a LUN and others put 20 VDIs on a LUN? Where does that 20 number even come from? Is there a performance or scalability difference in terms of the number of VDIs per LUN between a LVM-based LUN and a VMFS-based LUN? In this article I’m going to attempt to explain how organizations size LUNs, come up with these “magic numbers” and, most importantly, how this applies to a XenDesktop implementation with PVS so you can properly architect your storage infrastructure for optimal performance. I’ll wrap it up by giving you some guidance on how to size LUNs for the infamous “write cache drives”.
But before I begin, in typical Consulting fashion, here’s what I’m going to cover:
- Basic concepts such as what LUNs, VMFS or LVM are.
- Sizing file-based VHD storage repositories or data stores using EXT (local) or NFS (shared). I’m really only going to be talking about how to carve up LUNs on shared storage arrays that are accessed via block-based protocols (and really only FC/FCP) since that’s what I see most in the enterprise.
- Hyper-V and CSV sizing. I’m going to keep this discussion to XS and ESX/vSphere only…I don’t know enough about H-V and CSV to be honest since we haven’t seen it that much in the field yet.
- Advanced or vendor-specific concepts such as metaLUNs, FlexVols, VMFS volumes that span multiple LUNs, etc. I realize that different vendors have innovative and sophisticated methods of abstracting and aggregating LUNs. Every vendor also has different parallelism techniques to distribute and commit the LUN contents across physical spindles/drives. The goal is to keep this article vendor and array agnostic…so we’re starting with the basic building block – the “base LUN”.
- There’s a bit more to this “fuzzy math” of determining the optimal LUN size or the optimal number of VDIs per LUN (such as block size, partition alignment, etc.), but I’m trying to keep this simple on purpose. I encourage you to always engage your storage vendor and the smartest Citrix person you know whenever you are doing this exercise.
Now that I’ve done some CYA and covered what’s out of scope, let’s get to it! And I’ll start by saying there are countless whitepapers, articles, blogs and ramblings all over the Internet on this exact subject. The question of “What is the best LUN size?” comes up at least once a day in the VMware forums (case in point here, here and here) and we’re starting to see it in our forums more and more. But sadly, a lot of the advice in these forums is a bit all over the place, outdated and even sometimes incorrect! Luckily there are other resources and more authoritative articles and whitepapers on this subject. A few that I personally recommend are this, this, this (Appendix C) this and this. And you might be thinking…why is this guy recommending older VMW and EMC whitepapers? The first reason is because most of the concepts still apply today. And the second reason I’m telling you to visit our friends over there at VMW and EMC is because they’ve been doing this a lot longer than we have and there is a ton more information available on the subject…there is a reason that two of those VMW threads I recommended earlier have been hit over 20,000 times each.
Now that you’ve spent the last 10 hours reading up on all those threads, blogs and whitepapers I recommended, you probably know everything by now so you can stop reading. But if you just glossed over them or decided to update your Facebook status instead, keep reading. In the end, determining the number of VDIs or VMDKs that should be housed on a single LUN really depends. (Like you didn’t see that statement coming!). But why does it depend? Well, since you read all those resources you now know it depends on factors such as workload type, I/O pattern, HBA type and LUN queue depth, SCSI reservations, VM size, number of ESX/XS hosts in a cluster/pool and a slew of other array-specific items that I’m not even going to attempt to cover. Sound complicated? It certainly can be, but luckily we’ve been doing this for a while and some smart storage gurus have come up with simple formulas and calculators that make this a little easier. I’m not going to get into mathematical formulas or regurgitate what’s in those whitepapers and blogs, but I am going to do you a favor and summarize the results or what is considered “consensus”, as well as tell you what I’ve seen in the real world. So keep reading…you just might get some numbers out of a Consultant.
Most people seem to agree that 10-25 VMDKs per LUN is the “sweet spot” or “magic number” and that typically results in LUN sizes anywhere from 300-700 GB. Before you freak out, please keep in mind those are just AVERAGES and 500 GB LUNs with 16 VMs per LUN certainly won’t work in every situation. But that is what I’ve seen most when we’re dealing with typical server workloads, mixed I/O patterns, ~20-25 GB VMs and a typical number of ESX or XS hosts in a cluster or resource pool. So if you do some quick math, you’ll see where that 500 number comes from (and even though 20 or 25 * 16 is 320 or 400, we always want to leave some room for swap, snapshots, overhead/error, etc.). The trickier number, as I’ve alluded to earlier as the “magic number”, is the number of VMs per LUN and that’s how we really arrive at our LUN size. And with most things in engineering, there’s a trade-off and this time the trade-off is between performance and manageability. If you have smaller, less intensive workloads from an I/O perspective, you’re going to be on the larger side of that range. But if you have bigger VMs in size or more I/O intensive workloads, you’re going to be on the smaller side of that range. In fact, some workloads might even require their own LUN! It’s somewhat uncommon but you will see customers create a raw device mapping to a LUN (“RDM” in the VMware world) when the workload is extremely “heavy”, has a giant VM size (think file server, SQL, etc.), if we’re constantly doing snapshots/backups or when performance simply must be guaranteed. But the LUN per VDI or RDM model is a nightmare to manage as we scale out. On the other hand, you wouldn’t want to create one giant LUN that is several terabytes in size and put all your VMs on there – you’d get awful performance! And why is that? I’m going to tell you…and first let’s talk VMFS/ESX and then we’ll talk LVM/XS.
When you lay down VMFS on a piece of storage, it obviously adds some overhead…and that’s where I’ve seen VMW Consulting (and other storage experts) in the past recommend somewhere around 16 VMs/LUN. Why 16 and why not 1600? Because now we have to deal with fun things like SCSI commands, queuing and the ESX hosts locking the LUNs when they need access to storage for certain activities such as snapshots. Keep in mind VMW (and the storage vendors on the array side) have made some giant strides in this area and they’ve continually “upped” the number of SCSI commands that ESX hosts can support and SCSI conflicts are becoming less and less of a problem (did I just complement VMware?!?). But that doesn’t mean they’ve gone away or the problem has been completely eliminated…it just means the sweet spot or magic number might be more like 20 VMDKs/LUN with a mixed I/O pattern and “average” VMs these days.
Flip it over to XenServer and even though we don’t have something like VMFS, we do have LVM/LVHD and most experts agree it has less overhead than VMware’s clustered file system. And the last time I talked to my Xen gurus and smart storage buddies like Steve Jordahl, he said our “sweet spot” on XS (and specifically the LVMoHBA SR type we use with FC) is 20-30. So a bit higher obviously with LVM as opposed to VMFS and we don’t have the same locking issues associated with ESX/VMFS since we don’t use reservations, but we still have SCSI commands and queuing to deal with. And that’s because each LUN is represented on the host as a SCSI queue. And when you have multiple VDIs on a shared LUN, all I/O has to serialize through the queue and only one virtual disk’s traffic can traverse the queue at any point in time (which leaves all other virtual disks’ traffic waiting in line!). Now keep in mind we can configure things like active/active multipathing to give us multiple/parallel queues (and the data transport layer will also do its best to buffer and “batch” commands as opposed to sending every single SCSI command one by one). But my point is these queues are finite and there can certainly be contention, especially when you are doing certain operations. That’s why we want to be conservative sometimes and carve out more, smaller LUNs as opposed to fewer, bigger LUNs…so we can get the best performance and make sure all of our SCSI commands can get through these queues in a timely fashion. If they pile up, that’s when we get slow response times, high latency and awful performance as an end-user.
Fast-forward to 2011 and we don’t just have server VDIs anymore with sequential read patterns on RAID5 storage…so 500 GB LUNs would be an ultimate FAIL and I’ve seen customers make this mistake when implementing XenDesktop with Provisioning Services on ESX and XenServer. And the write cache drives are a perfect example (side note: we may have other things besides the PVS wC file on these secondary drives such as the page file, EdgeSight data, event logs, etc., but many of us just call these drives attached to each VM the “write cache drive”). If we have, for example, 5 GB wC drives that we are using with a bunch of Win7 VMs, we want to have much smaller LUN sizes (100-150 GB using the “20-30 Jordahl rule of thumb”) and we also want to place those VDIs on a disk array optimized for writes since it’s almost 90% random writes. One of our Synergy Innovation Award finalists is another great (real-world) example – they have 3 GB drives attached to each Win7 VM for the wC among other things. And we carved out 100 GB LUNs with a RAID10 config on the backend…so they are on the higher end of that spectrum (~33) but that’s because we are using XS, only have about 6 XS hosts in each pool, the desktops have been optimized according to best practices, and we don’t have AV in these desktops so their IOPS are actually very low – 7 average IOPS/desktop if you can believe it! (And please don’t ask about the lack of AV…that’s another story for another day). But if the workloads were more intense, we had more hosts in each pool or we were laying down VMFS instead, we probably would have only placed 20 or so on each LUN.
So I know that was a long diatribe, but hopefully that helps shed some light on why people have created ~500 GB LUNs in the past and where that magic number of VDIs per LUN comes from. The moral of the story is somewhere around 20 VDIs or VMDKs per LUN is a safe bet if you know nothing else! I also sincerely hope this information helps you size your storage infrastructure correctly on future XenDesktop projects. Because if you don’t (and I’ve seen this time and time again…), your users might complain about “slow” performance and your VDI project might come to a screeching halt.
Good luck out there and feel free to drop me a comment below if you liked the article.