Citrix TIPs Series: Scalability best practices Q&A

Last week, I joined Login VSI Product Manager Mark Plettenberg to present the latest updates and leading practices related to scalability. In case you missed it, the webinar recording is available on our Technology in Practice homepage.

The event included a live Q&A at the conclusion of the webinar, but Mark and I ended up with a few more questions than we could get to. I’ve included those questions (and answers) here, in addition to the questions we received the most during the event. Enjoy!

What is the recommended Azure instance type for Citrix Virtual Apps and Desktops workloads?
A few years ago we were routinely recommending the D series in a scale-out fashion for our popular Citrix Virtual Apps and Desktops workloads (due to performance and price). But as shown during the webinar, the compute-optimized F series is now “winning” out more often due to the best performance and lowest price per user per hour. And at the time of this writing, the “F16s_v2” instance is what we recommend — it’s a larger VM, with 16 vCPUs and 32 GB RAM, so we’ve effectively changed our leading practice in this area: scale up with larger F’s vs. scale out with smaller D’s. Check out this article for more details, including the recommended instance types for AWS and GCP.

Does the Rule of 5 and 10 apply to GPUs or in a Cloud scenario?
No (to both). The Rule of 5 and 10 is really for on-premises, non-GPU scenarios. Once we introduce GPUs, that changes the rules of the game dramatically — the magic multiplier of 10 might become 1, for example. As for Cloud, we also don’t want to use the Rule of 5 and 10 there since we can’t “see” the underlying hosts and how they’re doing CPU over-subscription ratios, for example. It’s best to test with a tool such as Login VSI if you’re trying to understand your true density with different Cloud instances.

A lot of this content has been on CPU. What about memory and disk bottlenecks?
We tend to focus on CPU a lot as that is largely the limiting factor in Citrix workloads when we’re trying to optimize for high user density. Memory often does not tend to be the bottleneck and is less expensive versus CPU (with careful planning most customers are able to purchase enough RAM in each server so it’s not the bottleneck). The Microsoft operating system has also evolved over the years and we actually see a drop in memory usage when going from 2008 -> 2012 -> 2016. Disk also tends not to be a bottleneck these days either. We often recommend single-image management technology like PVS or MCS, each of which has features to significantly reduce I/O (write cache in ram with overflow, MCS IO, etc.).

Antivirus (AV) is a problem is my environment. What’s the general impact of AV and are there any new things to address the scalability impact?
AV can certainly have a big performance impact, especially if there are requirements to do real time scanning or scheduled scans during business hours or periods of heavy load. The impact varies in each customer scenario but we recommend to configure AV to have lessened impact on the environment while not compromising the efficacy of the solution. Please refer to our latest guidance on Tech Zone for our leading practices and recommendations related to AV.

What was the real-world magic multiplier for XA workloads you presented today?
9.2. I mentioned during the webinar that this came from 12 different real-world customer scenarios, and I simply averaged the results and it came out to 9.2 (vs. 10 in the Rule of 5 and 10). So, even in 2019, I think using the Rule of 5 and 10 is a quick and easy way to estimate Single Server Scalability.

I heard the ICT-R study showing the latest CR having 25-30 percent better performance than the last LTSR is being updated — is that true?
That is correct. Shortly after the webinar aired they announced that they found a flaw or some bias in their initial testing. While they’re fairly confident the results won’t change much and the latest CRs will out-perform the last LTSR, they’re going to re-test to be 100% sure. You can read more about their announcement here.

What did you say was the sweet-spot in terms of uptime in Azure when deciding between Reserved Instances vs. Pay-As-You-Go?
Right around 17 hours. This number was using pricing from a few months ago (and prices change all the time!), and it also assumes a three-year model using the F16 instance type mentioned above. But the break-even point was right around 17 hours — in other words, if your workloads will be up or “on” for more than 17 hours each day, it’s best to go with Reserved Instances vs. Pay-As-You-Go. This sweet-spot is likely to change in the future, and we’re hoping to increase flexibility with technology like AutoScale. See the video below.

Cheers, Nick
Nick Rintalan, Principal Architect – Citrix Consulting Services (CCS)

Topics

Products