This is my first ever blog article, so I’m quite nervous and excited at the same time.
Since the announcement of HDX 3D Pro and NVIDIA GRID vGPU at Synergy 2013, I have done many presentations, demonstrations and builds of this technology for customers and partners. The interest is massive, as it clearly addresses a use case that desktop virtualisation couldn’t efficiently manage previously. In this blog, I want to share five of my learnings from deploying HDX 3D Pro setups in customer environments, for both XenApp Hosted Shared Desktops and XenDesktop VDI.
Configure XenDesktop policies – minor changes can be the difference between perfect and pathetic
When you configure a machine with HDX 3D Pro and look through the policies available in Citrix Studio, you’ll find that there isn’t a huge number of specific HDX 3D Pro policies. The most useful policies are more generic, and can be applied to any workload in Studio. The five most important XenDesktop policies that I tweak for 3D applications are:
- Moving Image Compression
- Visual Quality
- Lossy Compression Level
- Enable Lossless
- Progressive Compression Level
I typically use these five policies to tweak the performance of an application over any network access scenario (LAN, WAN, remote access). Policies such as lossless compression and moving image compression will assist in maximising the performance of 3D applications over the WAN. Each application and workload behave differently, so it is important to test the user experience each time you change one of these settings.
To set the most appropriate policy level, it is important to understand the requirements of the users. If they are WAN users, what is the available bandwidth? How many users in a site? What is the latency between site and virtual desktop? Is there any optimisation of the user experience done by a CloudBridge? Use this criteria to validate your policy settings. In my setups, I reduce image quality and increase compression on moving images when deploying apps out for remote access. In this scenario, I have no control over the client connection bandwidth, so I provide the minimal acceptable performance. Tailor how your LAN and WAN policies look, and filter them by site and user for maximum effect.
Set session bandwidth limits – HDX 3D Pro can sometimes be very bandwidth hungry!
Despite this being a XenDesktop policy, it took me forever to work out, so I thought I’d save you all the trouble! HDX 3D Pro is quite light on client bandwidth, which makes it really effective in sites with poor connectivity. However, HDX 3D Pro can also be quite hungry on available bandwidth across the network – especially in WAN scenarios. Should a 3D application be presented to an endpoint, HDX 3D Pro will try and use all of the available bandwidth to “further enhance that performance”, because in most scenarios: more bandwidth means a better user experience.
While this may be viable over the LAN, this principle doesn’t always hold true over a WAN or remote access scenario. Over a WAN, this excessive bandwidth usage can affect other users in the branch site. In fact, it can be detrimental to the session itself. In a test that provided 40 Mbps available bandwidth to a desktop running an NVIDIA GRID K260Q vGPU and Unigine Heaven benchmark, it used all of that bandwidth while rendering rotating images and thus couldn’t process any subsequent simple moving images!
One of the techniques that can be used to overcome this is to apply specific bandwidth limiting policies for session. You can do this per virtual channel, or for the entire session. When you set a session bandwidth limit, you cap the maximum usage irrespective of the available bandwidth, and therefore each 3D Pro session will have a predictable user experience. The second benefit of doing this is it reduces the chances of impacting other user sessions. For a branch office over the WAN with 20 people and a 10 Mbps connection to the site, you don’t want one person running ArcGIS or AutoCAD impacting the performance of the entire branch.
This policy is extremely effective for use over the WAN. If you use bandwidth limits, it is important to tailor these to each individual branch site, as well as the bandwidth usage for each person. XenApp and XenDesktop can filter these by users and site, and I highly recommend using this.
Look for CPU and memory bottlenecks – poor graphics performance may not always be the GPU
Another common occurrence I’ve seen is an image rotating in a jerky fashion on the screen. For example, I was working with an urban planning application recently, and one of the flyover scenarios ran smoothly on the desktop for 3 seconds, and then ran poorly for the next 10 seconds. This process continued on-and-on for the entire length of the flyover. What I initially suspected as a GPU issue (my first thought: “oh the card is clearly not handling it”) was definitely not the case.
The first thing to understand is the difference between CPU and GPU workloads. As a desktop guy, it took a little while to understand what is processed by a GPU. In a nutshell, any actions that involve rotation, reshading and redrawing are actioned by the GPU. A dead giveaway in most applications is the drawing of shadows after an image moves – that is the GPU working. Pretty much everything else is handled by the CPU. So be perceptive to where the bottleneck might be if you see sluggish performance on your application, it’s more likely related to CPU or RAM, rather than the GPU.
When I work with new applications, as a general rule of thumb, I provision a client VM with 4 vCPUs and 8 GB RAM as my starting infrastructure point. The thing to understand here is that desktop towers with physical graphics cards are often highly spec-ed – a customer showed me his machine with 16 physical cores and 48 GB RAM (and he had one of the smallest ones in his team!). From here, I will install the application as per the requirements of the client, and get them to test the performance. Run up perfmon counters, track CPU and RAM usage – eliminate CPU, memory or storage as the cause for any performance loss in the environment. Scale up or scale down the VM resources as per the application performance, and test accordingly.
Baseline the current physical environment – understand how the applications perform on a physical machine
I can’t stress this enough – especially around HDX 3D Pro. The vast majority of applications used in the enterprise can be virtualised and published through XenApp. These applications are able to handle the parallel processing capabilities of the Windows Server that XenApp is sitting on.
Where this gets tricky is when looking at 3D applications. A lot of these applications are not written with virtualisation in mind, they subscribe to the notion of the application being physically installed on an endpoint. As a result, a lot of these applications have not been developed to be multi-threaded, they may only use a single CPU for the application, no matter how many resources you throw at it. I find it quite hilarious when an ISV lists their PC requirements as the biggest, gruntiest desktop tower that you can buy, for the application itself to only use a single processor. Other applications still do it wrong, they flood the first processor and only use another when the first is completely utilised. To best ensure a successful transition to virtualised 3D applications, it’s best if it can successfully perform parallel processing, like the one below.
Test how the application performs on a physical PC, monitor the CPU and memory. This helps you eliminate GPU as the cause for any performance lags. If the application has a best practices guide for virtualised environments, use that to tailor the setup. There is usually a bunch of registry key changes and policies that can be applied in the application itself to make it perform better virtually.
Citrix is working very closely with ISVs to certify their applications and make them HDX 3D Pro Ready – so they are validated to run on XenApp and XenDesktop.
Make no assumptions – understand the intricacies of your infrastructure
Behind any good XenDesktop deployment, there are infrastructure architects who understand the nuances of their setups: storage, memory, compute, intranet and extranet topologies. This becomes even more critical in the 3D world, as infrastructure issues are masked and perceived to be “3D graphics problems”. Be willing to challenge your normal infrastructure principles, in order to maximise the user experience.
Ensure that proper network design principles are followed when setting up branch sites. Typically, users of 3D applications need a “near-perfect” experience, so design your branch network to handle this. The worst thing IT can do is provide a user with a poor experience on a virtual desktop. They will simply want their old PC back, and won’t care about the benefits of virtualisation.
Consider the behaviour over the WAN, the latency, and any jitter that might be on the network. Jitter is a killer to 3D performance, a little bit of jitter can make a smooth render into something jerky and unusable very quickly. As a guideline, anything up to 150ms of WAN latency (assuming no huge fluctuations in latency or jitter) is the upper realms of a non-optimised HDX 3D Pro session. Citrix CloudBridge can certainly assist in use cases where there is high latency and small pipes over the WAN, as it does a QoS of the ICA streams and enhance the performance of these graphics applications.
As a general rule, I run my HDX 3D Pro initial workloads on two different infrastructure stacks, typically on a VM supported by a SAN, and another on local disk (typically SSDs). Set everything exactly the same, and benchmark the performance you get doing the same common steps in each application. This gives me a better indication of where the graphics-intensive applications perform better (which may be on local storage), and then I’m in a more informed position to decide on my 3D workload architecture.
Hope this helps – happy HDX 3D Pro-ing!
Sales Engineer, Australia and New Zealand