Let’s face it – we see a lot of VMware out there. And I might see a bit more than most people since I am part of our Consulting team and I primarily work in the large/enterprise space. In all honesty, I’ve probably done about the same amount of XenDesktop on vSphere deployments as XenDesktop on XenServer or Hyper-V deployments. So I’ve had to stay pretty sharp on VMW over the years…and that means staying on top of the major enhancements that affect Citrix in the latest versions (vSphere 4.x and 5.x). So while this might seem weird that I’m writing an article about VMware primarily, I feel like it would be a disservice to our customers if I didn’t write about this particular topic since it’s such a game-changer in my opinion. Before we begin, if you haven’t read my last article on LUN sizing, it would be wise to read it now. Because this is essentially a follow-up article and I’m going to reference several things from that previous article.
Before we jump into VAAI (and specifically ATS and the impact this important feature has on sizing block-based LUNs), we need to revisit some of the basics related to VMFS and how SCSI reservations have been used in the past for “locking”. As many of you know, VMFS is VMware’s proprietary cluster file system. And while VMFS allows VMW to do some neat things like thin-provision block storage, it also has some special considerations being a distributed file system. Since multiple ESX hosts can effectively “share” a single VMFS-based LUN, something is required to control all of the operations between the ESX hosts. Otherwise you would have serious issues like data corruption if multiple hosts tried to write to the same block on a LUN simultaneously. In order to keep the ESX hosts “in check” and prevent this sort of thing from happening, VMFS uses on-disk locks to synchronize metadata updates. And these types of metadata updates are required when you create, modify or delete files. Earlier versions of VMFS (pre-ESX 3.5) used SCSI reservations to acquire these locks. And while a SCSI reservation is only required for a short duration to obtain the lock, if you have lots of VMs on each ESX host (i.e. like in a XD deployment) and they all reside on the same LUN, you can see how there might be contention and SCSI reservation conflicts become a problem. For this reason (and a few others that I described in my last article on LUN sizing), storage vendors as well as VMW and Citrix have sort of agreed to effectively cap the number of VMs on a block-based LUN to somewhere in the neighborhood of 20-30. So most of us out there have been adhering to this advice of 20-30 VMs per LUN and all is well and good…
But that was then and this is now. Beginning with ESX 3.5, VMW acknowledged this problem associated with SCSI reservation conflicts and implemented something called optimistic locking. Essentially this postpones the acquisition of the disk lock via SCSI reservation until as late as possible in the lifecycle of the VMFS metadata update. But what’s maybe more important is that in optimistic mode only one SCSI reservation per transaction is used as opposed to one reservation per lock. So more intelligent locking and less SCSI reservations overall. And this certainly made things better, but you still didn’t see VMW (or us) recommending much more than 30 VMs per LUN in deployments with ESX 3.5.
And then vStorage API for Array Integration (VAAI) came along a couple years ago and it started changing the game (but not quite when it was initially released – I’ll explain in a minute what I mean by that). If VAAI is new to you, then you might want to check out the VMW FAQ on this subject. But the idea behind VAAI is simple – enable ESXi hosts to offload specific VM and storage management operations to compliant storage hardware, thus freeing up valuable resources on the host. There are several features or “primitives” within VAAI, but the one primitive I really care about that is related to this subject of LUN sizing is ATS (Atomic Test & Set). Beginning with vSphere 4.1 and VAAI-capable arrays, VMFS-3 started using ATS for on-disk locks as opposed to SCSI reservations. ATS is one of the many primitives within VAAI as I mentioned and it’s sometimes referred to as “hardware acceleration” or “hardware accelerated locking”. What does this mean? Instead of implementing locking via SCSI reservations (in software), locking is now offloaded to the array and done in hardware! What ATS is really doing under the covers is a “compare and write” of SCSI blocks in a single operation using proprietary array-specific operation codes. And this enables much more granular locking of block storage devices. In other words, instead of locking entire LUNs via SCSI reservations, we can now lock only the specific blocks within a LUN that we need – and we lock them via ATS which is done on the array side in hardware, so it’s much faster and more efficient. I hope light bulbs are going off because this is absolutely going to revolutionize the way we size LUNs with VMFS.
But notice I said “not quite” in that last paragraph. What I meant by that is although VAAI and ATS debuted in vSphere 4.1, it was kind of half-baked. As one of VMware’s Storage Architects explains in this excellent article on VMFS locking, VMFS-3 in vSphere 4.1 only used ATS for just 2 out of 8 total “operations”. So we would still fall back to using SCSI reservations if you required one of the other 6 operations noted in that article. But the real kicker was ATS was used ONLY if the on-disk lock was un-contended! So if there was any contention whatsoever (think about a XD deployment with 100’s of VMs on ESX hosts sharing LUNs), then it would actually fall back to using SCSI reservations. So even if you had a VAAI-capable array in your XD on vSphere deployment, I bet VMW Consulting was not recommending much more than 30 VMs per block-based LUN still with vSphere 4.1!
Fast-forward to today and VMW is shipping vSphere 5.0 U1 now. And vSphere 5.x uses VMFS-5…and now I have to give VMW props for finally getting it right with VMFS-5. Not only does almost every enterprise array support VAAI now, but VMFS-5 uses ATS for all 8 operations…and even when there is contention or mid-air collisions. That means absolutely no SCSI reservations on VMFS-5 implementations with VAAI-capable arrays. That’s awesome.
But what does it all mean, Basil?!? This means that if you’re deploying XenDesktop on vSphere 5.x with a VAAI-capable array, you no longer should be sizing your block-based LUNs (FC, iSCSI, FCoE) with 20-30 VMs per LUN! Note I said “block-based” LUNs in that previous statement – none of this stuff we are talking about (SCSI reservations, LUNs, ATS, etc.) applies to file-based storage protocols such as NFS.
So I’m sure your next question is how do we size LUNs with VMFS-5 and VAAI? How many VMs per LUN is the sweet-spot now? I can tell you that I’ve personally seen a couple newer XD 5.x on vSphere 5.x deployments with VAAI arrays (both EMC and NetApp) that are well over 30 VMs per LUN. I was just looking at one deployment that had about 55 PVS wC drives on each FC-based LUN…with absolutely zero performance issues and extremely low latency. It was pretty amazing since I have been so used to telling customers to create more, smaller LUNs in the past!
So what is the new magic number with VAAI? I did a TON of research on this subject, asked some colleagues at EMC and VMW, and devoured every piece of literature on this topic on the web that I could find. The bottom line is the guidance varies based on who you talk to. In a test done by EMC, they saw 50% less I/O queuing and the VMs booted 4 times faster compared to the same config on a non-VAAI-capable array. In another test done by HP, they got 6 times as many VMs per LUN with VAAI. Hitachi simply says that the number of VMs per LUN is no longer a constraint anymore (which is a tough sell for me personally, but it speaks volumes about the power of VAAI). NetApp tested 128 VMs per LUN (which is essentially 5 times what we were recommending before) and saw no performance degradation with VAAI. Other experts like Duncan provide formulas. And lastly, and maybe this is the most telling, VMW publishes certain configuration limits and maximums with every release of vSphere and View. Before the days of VAAI, they recommended no more than 64 VMs per LUN (and remember – we always implemented no more than 20-30 in practice). But now VMW is saying they recommend no more than 140 VMs per LUN with VAAI support. So if you do the math and essentially cut that in half like we were doing before, the “real” number we might want to implement in practice might be more like 50-75 VMs per LUN. After doing all this analysis and research, it seems the smart money is somewhere around 2-3x what we recommended before. It’s nice to see some of the storage vendors saying 4-6x, but I would start with probably 50-75 VMs per LUN and only implement anywhere close to 100 or 120 VMs per LUN after rigorous testing in a non-production environment.
It goes without saying that your LUNs must be able to support the IOPS the VMs are generating. But since we are talking about VAAI-capable arrays (i.e. enterprise storage), I’m betting it won’t be a problem since a lot of times a “LUN” is actually a virtual representation/abstraction of lots of smaller LUNs with maybe 100’s of spindles behind it.
Lastly, before just sizing LUNs with these numbers, please check that the array supports VAAI/ATS for your particular storage protocol and that ‘HardwareAcceleratedLocking’ is enabled on your ESXi hosts (it should be by default, but I would absolutely verify this). Here is a great article with a handy table broken down by storage array and protocol to verify if your array supports VAAI and ATS (“Hardware Assisted Locking” column in the table). And the VMW VAAI FAQ that I pointed you to earlier has the CLI/vCLI commands to validate whether ATS is enabled.
I hope this guidance helps. Good luck out there and feel free to drop me a comment below if you liked the article or have a question.
Nick Rintalan, Senior Architect, Enterprise Architecture, Citrix Consulting