XenCenter monitoring NVIDIA K2 GPU usage

XS6.2 SP1 introduced virtualised GPU technology, vGPU to the XenDesktop platform, enabled by HDX 3D Pro and to accompany this technology, new metrics were also added to Citrix XenServer allowing access to metrics available from NVIDIA’s K1 and K2 GRID cards via their NVML libraries.

The new metrics available are:

Class

Name

Units

Description

Enabled by default?

Condition for existence

Host gpu_memory_free_<pci-bus-id> bytes Unallocated framebuffer memory  No A supported GPU is installed on the host
Host gpu_memory_used_<pci-bus-id> bytes Allocated framebuffer memory  No A supported GPU is installed on the host
Host gpu_power_usage_<pci-bus-id> mW Power usage of this GPU  No A supported GPU is installed on the host
Host gpu_temperature_<pci-bus-id> °C Temperature of this GPU  No A supported GPU is installed on the host
Host gpu_utilisation_compute_<pci-bus-id> (fraction) Proportion of time over the past sample period during which one or more kernels was executing on this GPU  No A supported GPU is installed on the host
Host gpu_utilisation_memory_io_<pci-bus-id> (fraction) Proportion of time over the past sample period during which global (device) memory was being read or written on this GPU  No A supported GPU is installed on the host

We are limited to the metrics available from NVML and as such the users should be aware that the metrics do pertain to the physical GPU.

Those looking to use these metrics in benchmarking analysis may also be interested in some of the previous blogs on GPU benchmarking on the Citrix blogs such as:

Over the last few versions of XenServer we’ve been increasing the number of metrics available and also improving the documentation and alerting mechanisms around these metrics. These new metrics complement those comprehensively documented in Chapter 9 of the XenServer 6.2 Administrators Guide. The documentation is extensive and covers:

  • Existing metrics and their units including IOPs per virtual disk, latency, CPU usage, CPU P-states and many others
  • Details of metrics not enabled by default and how to enable them
  • Documentation for tools to generate the metrics in csv format
  • How to enable and view metrics from XenCenter
  • How to configure alerts and automated emails based on metric thresholds

The metrics from XenServer are available from the XenCenter GUI (details included in this blog) but are also programmatically directly available to sys. admins. and many are incorporated into third party monitoring products such as CA Nimsoft, EGInnovations, Goliath’s MonitorIT and many others. The availability of these new metrics should help XenDesktop users looking to monitor and benchmark GPU usage and I hope we will soon see the new metrics incorporated into third party monitoring products.

For developers and enthusiastic system administrators looking to write monitoring products we also provide developer guides on the xenserver.org site, including sample code for metric collection.  The metrics are available not only from the XenServer command line interface (xe CLI) but also via the XenServer PowerShell cmdlets, python, Java, C# and C APIs.