I’ve said this before, but I’ll say it again — best practices should evolve.
We need to constantly evaluate leading practices, so with that in mind, it’s time to change another best practice related to XenApp scalability. If you remember my last article on this topic (v2015), I suggested altering the default Intel microprocessor snooping mechanism if you ran into the Haswell-EP chips. Why? Because those particular chips were built in a very unique fashion (resulting in “uneven” NUMA nodes) and it presented some challenges when carving up XenApp VMs on a hypervisor. I encourage you to read all the details I provided in that blog post (published ~2 years ago), but the net-net is that enabling Cluster On Die (COD) in the BIOS allowed us to work around the strange underlying chip and ring configuration. It basically made nodes uniform or even and allowed us to carve up XenApp VMs like we always might. Fast-forward to today and what’s changed? Well, Intel is shipping new chips for starters. And the Xeon E5 “v4” chips have made us revisit this particular leading practice.
How do these newer Intel “v4” chips impact XenApp scalability? Quite simply, Intel fixed the problem and made our lives a lot easier. These newer Broadwell-EP chips make things even again — the dice are split along the rings so all cores on each ring are nicely uniform on a NUMA node. So we don’t need to steal cores or perform any tricks by implementing COD. But does that fix the problem alone, especially in our world of highly optimized NUMA workloads running on a hypervisor? Not quite. And I still might recommend COD without this second “fix.” If you remember back to my 2015 article, there are a few different snooping mechanisms to ensure cache coherency. And one of the reasons we opted for COD was because of the performance benefits related to local memory latency and bandwidth at the chip level (COD gave us much better performance vs. the default snooping mechanism in Haswell-EP chips). This is very important for performance and to maximize single server scalability (SSS) in our Citrix/Microsoft world. I’m happy to report that Intel did us another solid and made the default snooping mode smarter – Intel devised a 4th snooping mode called home snoop with directory cache and opportunistic snoop broadcast. Without getting into all the gory details, you can think of it as a cross between home snoop and COD. And it’s smarter about keeping things localized (technically, the performance associated with local memory latency is much improved), which is important for our XenApp workloads. The best part is this new snooping mode is enabled by default on these v4 chips. Thanks, Intel!
So, what does it all mean? When you run into one of these 2680 v4s or 2697A v4s in the field (two of the most popular specific chips we’re seeing right now), you can simply leave the default BIOS settings in place and carve up your XenApp VMs like you’d normally would. The NUMA nodes are already ready to rock and roll. For the 2680, I’d recommend 7 vCPUs for each XenApp VM with 8 VMs on each host. And for the 2697A, I’d go with 8 vCPUs with 8 VMs on each host. Again, simply leave the new and improved snooping mode intact.
As for the results with these v4 chips? As you’d expect, you should see improved density compared to the older Haswell-EP chips. But, in general, the Rule of 5 and 10 still applies. And as long as you are optimizing your image and implementing TW+, you should be able to get a little more than 10 users per core with these v4 chips. Here is exhibit A: FrankA used LoginVSI last year with the 2680’s and got a little over 10 users per core. And we’re seeing about 10-12 users per core in the field with these v4 chips.
But you probably just heard that we’re about to change our default ICA transport to EDT, right? So, how does moving from TW+ over TCP to TW+ over EDT impact SSS? Well, you’ll just have to wait for my next article around the X-Mas time frame. I’m going to show how EDT impacts scalability and also provide a summary of all the scalability best practices that I’ve written about to date.
Hope this updated guidance helps in the meantime. Cheers.
Nick Rintalan, Principal Architect & Sr. Director of Enterprise Architecture, Citrix Consulting Services (CCS)
Citrix TechBytes – Created by Citrix Experts, made for Citrix Technologists! Learn from passionate Citrix Experts and gain technical insights into the latest Citrix Technologies.
Want specific TechBytes? Let us know! firstname.lastname@example.org