With PVS 7.1 and later you may have noticed a new caching option called “Cache in Device RAM with Hard Disk Overflow”. We actually implemented this new feature to address some application compatibility issues with Microsoft ASLR and PVS. You can check out CTX139627 for more details.
One of the most amazing side effects of this new feature is that it can give a significant performance boost and drastically increase your IOPS throughput for PVS targets while reducing and sometimes eliminating the IOPS from ever hitting physical storage!!!
My colleague Dan Allen and I have recently been conducting some testing of this new feature in both lab and real customer environments and wanted to share some of our results and some “new” recommended practices for PVS that encourage everyone to start taking advantage of this new cache type. This is a 2 part blog series where I first recap the various write cache options for PVS and discuss some of new results we are seeing with the new cache type. In part 2 of the series, Dan Allen dives deeper into performance results and provide some guidelines for properly sizing the RAM for both XenApp and VDI workloads.
PVS Write Cache Types
When using Provisioning Services, one of the most important things to ensure optimal performance is the write cache type. I am sure that most of you are already familiar with the various types, but I will review them here again as a refresher!
- Cache on server: this write-cache type places the write-cache on the PVS server. By default it is placed in the same location as the vDisk, but a different path can be specified.
This write-cache type provides poor performance when compared to other write cache types, limits high availability configurations, and should almost never be used in a virtual environment. This option was typically only used when streaming to physical end points or thin clients that are diskless.
- Cache on device’s hard drive: this write-cache type creates a write-cache file (.vdiskcache) on the target devices’ hard drive. It requires an NTFS formatted hard drive on the target device to be able to create this file on the disk. This cache type has been our leading Citrix best practice for environments to date and most of our deployments use this write-cache type as it provides the best balance between cost and performance. To achieve the highest throughput to the write-cache drive, Intermediate Buffering should almost always be used (caution should be used with target devices hosted on Hyper-V where we have occasionally seen adverse effects). Intermediate Buffering allows writes to use the underlying buffers of the disk/disk driver before committing them to disk allowing the PVS disk drive to continue working rather than waiting for the write on disk to finish, therefore increasing performance. By default this feature is disabled. For more information on Intermediate Buffering, including how to enable it, please refer to CTX126042.
- Cache in device RAM: this write-cache type reserves a portion of the target device’s memory for the write cache, meaning that whatever portion of RAM is used for write-cache is not available to the operating system. The amount of memory reserved for write-cache is specified in the vDisk properties. This option provides better throughput, better response times, and higher IOPS for write-cache than the previous types because it writes to memory rather than disk.
There are some challenges with this option, though. First of all, there is no overflow, so once the write cache is filled the device will become unusable (might even blue screen). Therefore, there has to be plenty of RAM available for the target devices to be able to operate and not run out of write-cache space, which can be expensive, or just not possible because of memory constraints on the physical host. Second, if there is a need to store persistent settings or data such as event logs, a hard drive will still be required on each target. On the flip side, this hard disk will not be as large or use as many IOPS as when using “Cache on device’s hard drive” since the write cache will not be on it. We have typically seen customers successfully use this feature when virtualizing XenApp since you do not run as many XenApp VMs on a physical host (compared to VDI), so often times there is enough memory to make this feature viable for XenApp.
- Cache on device RAM with overflow on hard disk: this is a new write-cache type and is basically a combination of the previous two, but with a different underlying architecture. It provides a write-cache buffer in memory and the overflow is written to disk. However, the way that memory and disk are used is different than with “Cache in device RAM” and “Cache in device’s hard drive” respectively. This is how it works:
- Just as before, the buffer size is specified in the vDisk properties. By default, the buffer is set to 64 MB but can be set to any size.
- Rather than reserving a portion of the device’s memory, the cache is mapped to Non-paged pool memory and used as needed, and the memory is given back to the system if the system needs it.
- On the hard drive, instead of using the old “.vdiskcache” file, a VHDX (vdiskdif.vhdx) file is used.
- On startup, the VHDX file is created and is 4 MB due to the VHDX header.
- Data is written to the buffer in memory first. Once the buffer is full, “stale” data is flushed to disk.
- Data is written to the VHDX in 2 MB blocks, instead of 4 KB blocks as before. This will cause the write-cache file to grow faster in the beginning than the old “.vdiskcache” cache file. However, over time, the total space consumed by this new format will not be significantly larger as data will eventually back fill into the 2 MB blocks that are reserved.
A few things to note about this write-cache type:
- The write-cache VHDX file will grow larger than the “.vdiskcache” file format. This is due to the VHDX format using 2 MB blocks vs. 4 KB blocks. Over time, the size of the VHDX file will normalize and become closer in size to what the “.vdiskcache” would be, as data will eventually back fill into the 2 MB blocks that are reserved. The point at which the size normalizes varies by environment depending on the workload.
- Intermediate buffering is not supported with this write-cache type (this cache type is actually designed to replace it).
- System cache and vDisk RAM cache work in conjunction. What I mean by this is that if there is block data that is moved from the PVS RAM cache into the disk overflow file, but it is still available in the Windows System Cache, it will be re-read from memory rather than disk.
- This write-cache type is only available for Windows 7/2008 R2 and later.
- This cache type addresses interoperability issues with Microsoft ASLR and Provisioning Services write-cache where we have seen application and printer instability that result in undesirable behavior. Therefore, this cache type will provide the best stability.
- A PVS 7.1 hotfix is required for this write-cache type to work properly: 32-bit and 64-bit.
New PVS RAM Cache Results
Now, a review of the newest cache type wouldn’t be complete if we didn’t share some results of some of our testing. I will summarize some of the impressive new results we are seeing and in Part 2 of the series, Dan Allen will dive much deeper into the results and provide sizing considerations.
Test Environment 1
Server CPU: 2 x 8 core CPU Intel 2.20 GHz
Server RAM: 256 GB
Hypervisor: vSphere 5.5
Storage: EMC VNX 7500. Flash in tier 1 and 15K SAS RAID 1 in tier 2. (Most of our IOPS stayed in tier 1)
XenApp Virtual Machine
vServer CPU: 4 vCPU
vServer RAM: 30 GB
vServer OS: Windows 2012
vServer Disk: 30 GB Disk (E: disk for PVS write cache on tier 1 storage)
We ran 5 tests using IOMETER against the XenApp VM so that we could compare the various write cache types. The 5 tests are detailed below:
- E: Drive Test: This IOMETER test used an 8 GB file configured to write directly on write-cache disk (E:) bypassing PVS. This test would allow us to know the true underlying IOPS provided by the SAN.
- New PVS RAM Cache with disk Overflow: We configured the new RAM cache to use up to 10 GB RAM and ran the IOMETER test with an 8 GB file so that all I/O would remain in the RAM.
- New PVS RAM Cache with disk Overflow: We configured the new RAM cache to use up to 10 GB RAM and ran the IOMETER test with a 15 GB file so that at least 5 GB of I/O would overflow to disk.
- Old PVS Cache in Device RAM: We used the old PVS Cache in RAM feature and configured it for 10 GB RAM. We ran the IOMETER test with an 8 GB file so that the RAM cache would not run out, which would make the VM crash!
- PVS Cache on Device Hard Disk: We configured PVS to cache on device hard disk and ran IOMETER test with 8 GB file.
With the exception of the size of the IOMETER test file as detailed above, all of the IOMETER tests were run with the following parameters:
- 4 workers configured
- Depth Queue set to 16 for each worker
- 4 KB block size
- 80% Writes / 20% Reads
- 90% Random IO / 10% Sequential IO
- 30 minute test duration
|Test #||IOPS||Read IOPS||Write IOPS||MBps||Read MBps||Write MBps||Avg Response Time (ms)|
* In test scenario 3, when the write-cache first started to overflow to disk the IOPS count dropped to 31,405 and the average response time was slightly over 2 ms for a brief period of time. As the test progressed, the IOPS count gradually increased back up and the response time decreased. This was due to the PVS driver performing the initial flush of large amounts of data to the disk to make enough room in RAM so that most of the data could remain in RAM. Even during this initial overflow to disk, the total IOPS was still nearly twice as fast as what the underlying disk could physically provide!
As you can see from the numbers above, we are getting some amazing results from our new RAM Cache feature. In our test, the tier one storage was able to provide us a raw IOPS capability of a little over 18K IOPS, which is pretty darn good! However, when using our new RAM Cache with overflow feature, we are able to get nearly 70K+ IOPS when staying in RAM and were able to maintain nearly 69K IOPS even when we had a 15 GB workload and only 10GB RAM buffer. There are also a few other very interesting things we learned from this test:
- The old PVS Cache in RAM feature could not push above 14K IOPS. This is most likely due to the old driver architecture used by this feature. The new Cache in RAM with disk overflow is actually more than 4 times faster than the old RAM cache!
- The PVS “Cache on device hard disk” option, which uses the old .vdiskcache type could only drive about 50% of the IOPS that the actual flash SAN storage could provide. Again, this is due limitations in the older driver architecture
It is quite obvious that the new Cache in Device RAM with Hard Disk Overflow option is definitely the best option from a performance perspective, and we encourage everyone to take advantage of it. However, it is critical that proper testing be done in a test system/environment in order to understand the storage requirements for the write cache with your particular configuration. Chances are that you will need more disk space, but exactly how much will depend on your particular workload and how large your RAM buffer is (the larger the RAM buffer, the less disk space you will need), so make sure you test it thoroughly before making the switch.
Check out Part 2 of this blog series for more information on various test configurations and sizing considerations.
Another thing you might want to consider is whether your write-cache storage should be thick or thin provisioned. For information on this topic please refer to my colleague Nick Rintalan’s recent post Clearing the Air (Part 2) – Thick or Thin Provisioned Write Cache.
Finally, I would like to thank my colleagues Chris Straight, Dan Allen, and Nick Rintalan for their input and participation during the gathering of this data.