Please note this blog was written in April 2014 and you are reading it in the future you should check the support matrix for possible further improvements with NVIDIA and Citrix.

A few months ago we ran an interactive masterclass for XenServer and XenDesktop customers featuring storage configuration software from NetApp and also NVIDIA GRID vGPU (true hardware GPU sharing) for XenDesktop with HDX 3D Pro. We always run a live Q&A on these webinars with a team of real software engineers, product managers and solutions experts and afterwards review the quality of our answers and see what Citrix should do long term to clarify those questions that recur. During the EMEA masterclass several questions arose on CUDA support across many of our different GPU technologies. Additionally there was an error in a whitepaper that stated vGPU currently supported CUDA which isn’t yet true. So it seemed the perfect time to clarify the position.

For Citrix products, Vendor Drivers provide the support for DirectX, OpenGL, OpenCL and CUDA. All of Citrix’s GPU technologies rely on GPU vendors own drivers to avoid an additional layer of support and certification. This ensures that the support for DirectX and OpenGL is always up to date and available as for physical servers, this white paper from HP details how support on other hypervisors significantly lags Citrix. We have deliberately avoided using synthetic drivers to avoid the serious problems encountered by other vendors on benchmarks such as Redway3D.

OpenGL and CUDA support is also provided by the GPU vendors such as NVIDIA. If NVIDIA enable CUDA support for vGPU as they have for GPU passthrough, then Citrix will automatically gain support.

CUDA – a bit of history

CUDAis very interesting. CPUs have few cores optimized for serial processing. GPUs on the other hand can have hundreds or thousands of smaller cores designed for parallel performance. The “G” in GPU stands for Graphics because this has been historically where this type of processor was found to be extremely useful. Standard APIs and libraries such as OpenGL and DirectX were specifically developed to make developing rendering applications that made writing code that could be parallelised easy, exploiting the potential of the GPU. Typically these rendering tasks often involved implementing libraries that could handle lots of vector data, facets and manipulate them (think triangles!). However rendering is just one of a number of tasks and computational loads that performs really well on a GPU. GPUs are perfect the computational workloads involved in solving and processing all sorts of problems e.g. modelling protein folding, stock options pricing, SQL queries, and MRI reconstruction. Whilst it is possible to recast these problems into OpenGL or DirectX and this has been done, there are an awful lot of problems where trying to reformulate the problem into something represented by triangles is simply an awful lot of faff and obfuscation. And so along came CUDA 1.0 in June of 2007 which was NVIDIA’s proprietary C and C++ extensions that offer a generic way to optimize code to take advantage of the GPU followed  in December 2008 by OpenCL 1.0. In 2013 OpenGL released OpenGL Compute Shaders in in version 4.3. Plenty of compute programming options to choose from!

The vast majority of computational programmers (outside of Graphics applications) use C and C++ so with CUDA NVIDIA extended C to support parallel programming and developed compilers and parallel libraries that translate these massively parallel workloads to run on a GPU. A GPU wants 1000’s to 10,000 threads launched so that a GPU is loaded constantly with work. Nowadays an awful lot of software and graphical components used by such software has components that take advantage of technologies like CUDA. Now the technologies have gone beyond triangles and vector algebra and the rendering of surfaces and radiosity caused by material properties is possible. Along with a wealth of generic HPC calculations that didn’t map to graphical models well.

I personally like this blog post from Jeff Happoldt, CAD kernel guru investigating how Dassault’s popular ACIS kernel which is underneath many popular CAD applications could use CUDA. Jeff’s blog really helped me understand the power of CUDA by giving code and problem examples he ran experiments on. It’s great for the CAD industry to see Dassault engineers investigating GPU usage seriously, well worth a read! NVIDIA run a great CUDA development site, for developers and also featuring CUDA enabled application, beautifully presented and with all the tools you need. I’d also personally recommend the CUDA by Example textbook published in conjunction with NVIDIA, it’s the best coding book I bought last year, it’s a great introduction to the concepts and mind set required for designing and optimising code for GPUs.

CUDA traditionally was used for High Performance Computing (HPC), intensive number-crunching and really demanding algorithms. As such it’s quite a greedy technology and in these types of application algorithms want as much of the GPU as they can get. Nowadays though pockets of CUDA are being introduced into a wide range of CAD and 3D applications where a little bit of CUDA is perfect to accelerate the performance. It’s this later use that is most suited to GPU sharing. In practice there is probably a little more development and testing to do to ensure the most intensive HPC applications can never impact on another user but it’s something both us and NVIDIA are working towards.

Is CUDA supported for GPU pass-through?

Yes, CUDA support is provided by the NVIDIA drivers for GPU pass-through for XenDesktop and XenServer today. You can find NVIDIA’s list of CUDA enabled GPUs, here.

Note: A large number of NVIDIA cards (including popular choices like the K2000, K5000 etc…) are supported for GPU pass-through in conjunction with XenServer from NVIDIA and other vendors such as AMD.

What kind of applications benefit from CUDA support

NVIDIA’s CUDA site list many here. On last weeks masterclass we also saw user comments and questions (yes we do read all the responses) such as:

Are CUDA and OpenCL supported for vGPU (hardware shared GPU)

Not today, I believe it is something that NVIDIA are evaluating though. Personally I’d love readers to explain the potential use cases for this as my first thought was that the majority of serious CUDA and OpenCL application users would be so numerically greedy that they’d prefer pass-through and want their own dedicated GPU. NVIDIA GRID product management have been made aware of this blog and we’ll highlight any proposed use cases demand to them, so please do shout out if you want support for NVIDIA GRID vGPU cards!

Is it possible to use CUDA and OpenCL for XenApp – GPU sharing with RDS like workloads?

Yes, but only experimentally, i.e. it is not fully supported by NVIDIA…. yet.

Citrix XenApp offers GPU sharing of RDS workloads as described by my colleague Derek’s blog, here. NVIDIA have implemented experimental support. This means it is not fully supported as a production feature by NVIDIA but users can choose to trial the technology. Again please do provide feedback or use cases in this blog’s comments or direct to NVIDIA, if you feel this feature would be a useful production feature.

To enable the experimental support you will need to change a Registry Key by following the instructions in this article:

In practice it’s my belief that this type of sharing is best suited to those applications with pockets of CUDA and not full-blown HPC uses. Before taking this feature to full support both NVIDIA and Citrix need to QA and test that those very heavyweight HPC applications could not impact on another users’ experience. A large number of customers though have reported a good experience and benefits using this with 3D and CAD applications such as AutoDesk’s (see the comments in response to one of my earlier blogs, here).

On which guest OS’s can CUDA support be enabled for GPU pass-through – Windows Server?

In the Masterclass Q&A a participant asked the following, we followed up on the question with NVIDIA:

  • Q: Does CUDA work on server OS with GPU pass-through? Or does it only work on desktop OS with pass-through? We didn’t succeed to use CUDA in Matlab on server OS with pass-through. Didn’t test it on desktop as yet.
  • A: GPU pass-through works with support from an appropriate NVIDIA driver.  The key is NVIDIA driver support on that OS. One of the most powerful features of Citrix’s GPU technologies is that we do not use synthetic drivers (which would introduce a performance and certification overhead) and leverage GPU native driver support, so you get the latest OpenGL, CUDA etc. support as soon as it is available from vendors such as NVIDIA