Although I’m a product architect, and my co-workers accuse me of doing nothing more than drawing boxes and lines all day long, I do occasionally get to roll my sleeves up.  I’ve recently been working on a project that involved collecting detailed information about the storage I/O performed by a ‘typical’ desktop running inside a VM for a day in the office.  I captured a trace of all of the I/O caused by the VM, and then set about analyzing the data in the evening.  That’s when I noticed something a bit strange that re-enforced the need to perform storage optimization your VDI Operating System image.

Hopefully my boss isn’t reading this, so he won’t realize I only got a couple of hours work done before I headed for lunch, but here are the graphs that made me do a double-take:

In case you’re wondering, these graphs show the same data, but captured in two different ways.  The top graph is a perfmon capture of the storage operations per second of my VM, the lower one is the same data calculated from the raw trace of all the VM’s I/O and visualized in Excel.   They show the same patterns, which gives me confidence we’re seeing something real.  Also, both graphs are scaled down by a factor of 10, so I can fit them together for comparison – the peak IOPS was over 100.

See that big period of high disk activity starting around 12:30pm?  That’s right around the time I headed for lunch.  And look, it stops just as abruptly when I got back to my desk.  Huh – that’s ‘weird’, almost like my desktop was doing more work when I wasn’t there.

So, how much I/O does that period represent?  Here’s a graph showing the cumulative I/O for the VM over time:

That’s a huge proportion of the overall I/O activity for the entire day that occurred in a short period when I wasn’t even using my desktop.  Of the 541369 I/O operations for the entire day, more than half were generated whilst I was enjoying my lunch.

I’m curious by nature, so decided to take a look at the nature of the I/O operations being performed.  Ruben Spruijt has illustrated that typically 4KB I/O dominates for desktop OS’s, and that’s what we normally see as well – but look at the data I captured:

Looking over the entire day, 16KB I/O operations represented more than 60% of the total read operations for the desktop.  Was this disparity caused by this mysterious lunchtime activity?  I guess so: (in blue)

So, all ways round, this is a serious amount of storage load – what’s happening?

After digging around, I found that the OS image had a number of scheduled tasks that were configured with their out-of-box defaults to run on every boot after 10 minutes of inactivity.  Looking at these tasks, they all started at precisely 12:38pm:

  • Autochk
  • Diagnosis
  • SystemRestore

I can’t be sure which of these was the culprit, but we have our smoking gun.

Optimizing the OS

In a VDI deployment it’s strongly recommended that you optimize the OS for efficient storage utilization.  This is a technology that’s been a key part of Citrix Provisioning Server for many years:

A lot of people don’t realize that since XenDesktop 5, this tool is now such an integral part of XenDesktop that we silently run this tool as part of installing the Citrix code that runs within user desktops – whether you’re using MCS, PVS or will provision the desktops using your own scripts you get the benefit.

You shouldn’t have to remember to do this

When we designed XenDesktop 5 we decided that storage optimization is so critical that we wouldn’t offer the option of not optimizing the OS behavior.

Some of these optimizations aren’t things you’d want performed silently and automatically, so we don’t perform them all – such as disabling machine account password changes or disabling the recycle bin.

So why did I see such a large I/O bump in my traces?  I hadn’t installed the Citrix desktop software – for the purposes of my project, I wanted a more vanilla environment.

The optimization tool is only run automatically if you install the desktop software using the ‘autorun’ wizard (autoselect.exe) – if you install the individual MSIs, you won’t benefit from this automatic optimization.  The tool is installed (but not run) by the MachineIdentityServiceAgent_{x86|x64}.msi file, and can be run directly from C:\Program Files\Citrix\PvsVm\TargetOSOptimizer\TargetOSOptimizer.exe

Conclusions

I think there are a couple of things we can all learn from my experience.

First, as an industry, we need to allow ‘idle’ periods in our VDI deployment validation and load testing tools.  A lot of scale testing looks at the system under load – but as the data above shows, if a desktop is not fully optimized, Windows has a habit of storing up trouble for when you least expect it.

Second, always, always optimize your OS for storage performance – either by having XenDesktop do it automatically, by running an optimization tool yourself if you go for a more advanced type of install – or even performing the optimizations ‘by hand’ on a master image.