My colleague Derek asked me once how many angels could dance on the head of a pin within the context of GPU sharing…. Over the last few months I feel like I’ve been answering the question “How long is a piece of string”…. With this week’s release of vGPU increasing Citrix’s portfolio of GPU solutions I decided to answer a few questions via this blog.

I used to work in CAD/AEC/CAE and PDM/PLM application development, a lot of people assume this meant CAD applications like Solidworks or Catia and automotive models (cars) and yes there was a lot of that…. an awful lot …. but actually CAD usage has evolved, the data is incredibly diverse and users specialise in very specific workflows. Over the last few months I’ve blogged about why XenDesktop and XenServer are particularly good platforms for CAD and PLM.

Working on the XenServer GPU pass-through and vGPU features which are particularly appropriate for delivering 3-D data and graphics remotely and also having some background in CAD geometric kernel design means I’ve had an awful lot of people ask me “what an average CAD user does”, “what the average CAD package is” and most frequently “what’s the best benchmark for this GPU stuff”…..   my response is usually “How long is a piece of string?” followed by some of the following of the following mumblings, a mixture of thoughts, anecdotes and gotchas I didn’t spot:

  • These technologies are so impressive that everyone is trying to push it to the limit, I’ve seen some amazing demos of Unigine and ray-tracing but as a CAD engineer in a decade I never actually used ray-tracing, never saw dragons or flying pirate ships…. In fact I had a tendency to work in wireframe because it’s so simple and fast…High-end rendering as far as I was concerned was something that happened way up the chain after most of the engineers had done with the parts. The reflections get in the way and look too like edges. On the other hand there are a whole group of users doing nothing but rendering, and conversely companies who only use it three times a year for making pretty marketing brochures for a small number of products.
  • Have you seen how boring most CAD models are! Have you!!! The brochures look like this, the CAD designer often works on something more like this … I could bore you with tales of my wasted youth peering at the dullest collection of screws, hinges, engine bosses and sheet metal. Do benchmarks with dragons reflect your CAD usage, maybe if you are a renderer but for many probably not.
  • Look to use benchmarks (a few we have investigated can be found, here) that use the same technologies as the applications you use i.e. do you need to look are OpenGL or DirectX performance and if so which versions (A lot of CAD applications use DX9 which means a copy of 3DMark06 can be handy). One of the key features of all of Citrix’s GPU technologies is that they allow applications to run directly with the GPU vendors drivers. There is no API intercept or synthetic driver to be certified or which can contain bugs and our version support never lags the GPU’s on a physical server (i.e. you automatically get the latest versions OpenGL and DirectX supported). If you are evaluating using some other vendors virtualisation technologies it is worth checking whether all your key applications have been certified and supported
  • Look at user videos on YouTube to understand how people use CAD packages, how do you do architectural modelling, direct modelling versus history based. Are your users using lightweight formats such as STL or Siemens NX lightweight faceting?.
  • CAD users are diverse and most users use probably only 10% of a specific package’s features. Consider an application like Autodesk Inventor, historically AutoDesk has had one of the biggest and long established base of 2D users (think technical drawings/plans) but now-a-days Inventor has a very nice 3D offering with ray-tracing. That big 2-D customer base is still there, sketching and layout isn’t going to hit the GPU hard which means you can choose a much smaller vGPU for those users and support a much larger number of users on a single card.
  • It isn’t just about the GPU usage, applications consume CPU, RAM and have varying IO throughput. I’ve seen some benchmarks essentially double in performance when the CPU is doubled. Other resources can affect the capacity of an entire system to use the full capacity of the GPU. Look at other metrics and check for bottlenecks! I’ve recently blogged about some tools and also about some performance whitepapers available for the XenServer platform.
  • How does it feel? Feel the love…. Benchmarks and raw numbers are great for execs having to make and justify purchasing decisions, unambiguous numbers! But is it really usable?
    • For many benchmarks it can be standard practice to turn an NVIDIA option called vsync off, this makes frame rate numbers more comparable and higher but can lead to visual “tears” in the visuals on very intensive benchmarks and so the numeric benchmark isn’t the whole story. In practice though many users will find their usage less intensive and that they can work in production with vsync off and not notice any effects.
    • Josh Mings made some really astute observations on usability over on the solidsmack CAD site and suggested using “the selection test”, in my opinion anyone considering benchmarking a CAD application should consider this test as a basic sanity test!
  • Are you measuring frame rate on the GPU or in the desktop as seen by the application? The GPU does its work, but the human eye can only really consume about 30 fps (frames per second) and in between you have the graphics layer. Looking for the highest frame rate from the GPU alone can be a bit misguided as it can mean your GPU is actually doing excessive and unnecessary work. Consider looking at the frame rate delivered by the whole system. A higher frame rate on the GPU can improve latency, so you will probably want to check you have a frame rate higher than 30 directly from the GPU. This is obviously a question Autodesk have been asked within the context of their recent Revit via a browser trial on the Amazon Cloud, their comments on fps rate seem to confirm our investigations to.
  • Synchronising benchmarks, is running a really demanding benchmark on a GRID card with the maximum number of desktops (VDIs) attached all doing exactly the same task really realistic? It might be if your entire company focusses on rendering or gaming but for a large number of companies, it is more realistic that their users are doing tasks that hit the GPUs in bursts, with a large number of desktops running Microsoft word, checking FaceBook or completely idle might be more realistic…. How much time do your staff, spend in meetings, on the phone, coping with their inboxes or in the coffee room? I was surprised at the results starting benchmarks on the same machine with several minute delays produced.
  • What is the company’s demographics, how many secretaries, PLM users, CAD designers do you have and in what ratio? If your company makes plastic buckets and you have 100 people in sales and accounting and 3 designers, consider using a selection of benchmarks and weighting them accordingly. Don’t just think about the few high-end existing CAD users, consider the potential to accelerate the performance of all your workers
  • Benchmarking in just one VM: I’m not convinced that this on its own is a particularly revealing test. These technologies are about SHARING (yes kids, play nicely!). If you think only a few of your users will be hitting the GPU simultaneously I’d consider running the  benchmark test one or two desktops but with other desktops running less GPU demanding loads so you can appreciate the effects of those applications on other resources such as CPU.
  • Composite benchmarks: Many benchmarking programs consist of many tests covering a wider range of CAD like tasks 2-D modelling, 3-D modelling, rendering, lighting effects, plus a range of model sizes from quite small widgets up to vast assemblies. It’s very likely your own users will use these features in a different ratio to the benchmark. The SpecViewPerf11 benchmark is a synthetic benchmark that runs the footprint of a number of popular CAD programs, I like the fact they detail for each package what features are in each subtest e.g. for Pro-E (Creo) the first sub-test is for wireframe modelling, which as my de-facto preferred viewing mode I was particularly interested in.
  • Look closely at the Frame Buffer (Megabytes) of the vGPU type, many benchmarks stipulate a minimum buffer size greater than 512Mb at least, consequently the 2 types of NVIDIA vGPUs with 256Mb, the K100 and K200 might behave quite oddly on graphically demanding benchmarks. I was lucky enough to have some other NVIDIA cards around and found testing on a physical server confirmed my theories, if you have this option it’s worth considering. The artefacts of too small a frame buffer look the same on a physical machine!
  • Have you turned the GPU on? Yes, it is a serious question! I’ve seen a lot of customers struggle with benchmarks because many applications require you to turn on “hardware acceleration”, if it looks like the GPU isn’t being used as heavily as it should be this could be the reason. I hit this myself on some Autodesk applications and got started thanks to OpenBoundaries YouTube guide. The guys from OpenBoundaries and Thomas Poppelgaard have added a lot of additional information on AutoDesk application configuration in the comments of this blog (“OpenBoundaries delivering CAD applications remotely through the use of Citrix XenDesktop and XenServer accelerated by GPU sharing of NVIDIA GRID technology”).
  • I’ve seen quite a few users looking to benchmark the CPU frequency for applications such as CATIA. It is very application specific but some CAD applications appear to work well with a single core and Intel’s Turbo Boost technology. I blogged yesterday about how you can configure Turbo Boost for XenServer and interpret the results you see, the results are I’ve personally seen seem to be extremely dependent on an individual applications or benchmarks demands.
  • Is your measurement itself affecting performance? GPUs and vGPU is a very efficient technology and care needs to be taken to ensure that the act of measuring the GPU performance doesn’t itself hinder or interfere with the performance. I learned myself that when using the nvidia-smi monitoring tool, it was important to use it in looping mode using the –l option.
  • Think of the possibilities! Yes look at what you can do today but think of how your company could change or use your hardware better if you virtualise access to your GPUS.
    • If your designers work for only 8 hours a day using GPUs at their desks, that computing power is could be unused 70% of the time. If you do a lot of simulation work with a products like Ansys Fluent, HFSS or CFX, FEKO or Comsol Multiphysics, you could allocate the designers vGPUs during the day and then at night use the same GPUs on pass-through to increase your capacity to run simulation and solver design iterations, or you could use them to run additional test cycles raising your product quality, CAD hates regressions!
    • A few weeks ago I blogged about CIMdata’s research into how PLM users need to increasingly work remotely  and at locations other than their workstations, on tablets, smart phones, laptops at customer sites. Are your working practices at the moment constrained by your current physical infrastructure? Consider benchmarking and assessing how you would _like_ to work.
    • I’ve admitted I’d been a Luddite with my love of wireframe but that’s probably due to having worked on some seriously flaky infrastructure and networking back in CAD, the extra power available with GPU acceleration means that actually users like me should look at upgrading their CAD practices and even applications. There are some seriously beautiful and powerful rising stars such as Altair’s SolidThinking bringing simulation into the design process. Additionally, those into their Direct Modeller paradigms are probably a lot more sensitive to high performance interactive response and latency than most; if you are looking to use Synchronous Technology or SpaceClaim more heavily then GPU acceleration is definitely an option you should consider.

Even if you get your benchmarking wrong or your usage pattern turns out to be different to what you first imagined you can easily revisit and re-evaluate your decision with XenDesktop. XenDesktop is available on not only XenServer but also vSphere and Hyper-V, if you decide down the line that vGPU isn’t for you, choosing XenDesktop allows you to retain the option to change your GPU technologies and respond to hypervisor licensing costs. No benchmark will measure softer factors such as vendor lock-in! But if you do opt for vGPU and want to re-evaluate your choice after an initial trial XenDesktop is the only VDI solution to offer you the full range of hypervisors and GPU technologies.