Using PoolMon to Analyze RAM Cache in Nonpaged Pool Memory

In case you missed it a couple weeks ago, Andrew Morgan (one of our CTPs), posted a great article on how to accurately determine the size of the new RAM Cache.

As Andrew pointed out in his article, we now use nonpaged pool memory, so it’s fairly easy to fire up PoolMon and investigate. But I wanted to clarify one thing since Andrew only commented on the key pooltag denoted by ‘VhdR’. (He said he reached out to Citrix for further insight, but received no response…so allow me to respond! ;)) Andrew is spot-on that we use ‘VhdR’ for RAM cache allocation. But we also use ‘VhdL’ for internal metadata allocation, so that is the other pooltag to key on and grab for any scripting. It’s never going to be very large but I did want to point it out since it’s the other pooltag we use in case you want to incorporate it into any scripts.

Using WPA to Really Dig Into PVS

Working at Citrix has its benefits. One of those is being able to talk to the brilliant developers and product architects who write our code and get some “inside info.” In this case, I talked with Moso Lee, who really is the brains behind the new RAM Cache with Overflow to Disk technology (so we all have Moso to thank!)

We were talking about monitoring and debugging PVS and he quickly pointed out that we’ve always had an Event Provider for PVS (look for ‘VhdEtw.xml’ in the PVS installation directory). And if folks really want to go deep with PVS and identify performance bottlenecks, then you might consider using Windows Performance Analyzer (WPA). I’m not going to go into detail on how WPA or event tracking works, but I do want to provide a quick example on how to use this extremely powerful tool to truly understand and debug PVS. Because if you really want to understand how our PVS driver works, how we’re manipulating the storage stack or when we’re failing over and writing to the VHDX disk, for example, then this tool and article is for you! It’s certainly not for the average IT admin, but I know all the PVS geeks and filter driver gurus out there will love it.

Let’s get started.

As I mentioned earlier, PVS is an ETW provider for WPA. So, you’ll first want to grab the WPA which is part of the latest SDK for Windows 10.  You can selectively install the Performance Toolkit as shown in the screenshot, which includes WPA and WPR.

First we’re going to use the Windows Performance Recorder (WPR) and simulate some PVS disk and file I/O activity. Then we’re going to analyze what happened with WPA.  So, fire up WPR and click on “Add Profiles” and then point to this file. This is basically a PVS-specific template or profile that allows us to receive the events generated by the PVS event provider. You’re welcome. 😉 So, import that profile, copy the other options you see in the screenshot below and click the “Start” button.

Now, we’ll simulate some PVS activity in our lab (or your production environment if you dare!).

In this quick example, I’m using the new write cache method with a small memory buffer of 128 MB (please don’t use this small of a buffer in the real-world!).  And all I’m going to do is copy a 279 MB file to “C:\Users\User\Documents\test.bin” so I can force the PVS driver to not only “write” some data to nonpaged pool, but also so we can see what really happens when we fail over and start writing to the local disk (i.e. “D:\vdiskdif.vhdx”).  After you’re done copying the file and forcing the buffer to fill and spillover, you can stop the capture in WPR and open the results in WPA.

In the “Graph Explorer” within WPA, expand System Activity and select “Generic Events”.  If you look at the screenshot below, there are a couple key lines highlighted – WriteData and WriteRamData.  This shows the exact count of files being written to C:\vDisk (2419) and our VHDX file on the D drive (348).  The “WriteData” is actually less than shown because its cached in RAM and not flushed to disk quite yet.  But let’s keep digging to understand more.

Again in the “Graph Explorer”, expand “File IO” and “Count by Type”.  This picture (and the following screenshot) shows the reduction of IO (file count) and the duration of time it takes between writing to C:\Users\User\Documents\test.bin and the spillover write cache file at D:\vdiskdif.vhdx.  Very powerful stuff so you can easily identify pesky performance bottlenecks and rule out the PVS filter driver as the culprit. 😉

I think it’s probably wise to stop there since this is a lot to digest I’m sure.  But for those PVS geeks out there like Andrew, you can absolutely go deeper to understand where exactly the data is being written initially (and where it eventually lands) using disk offsets.  Just go back to “Generic Events” and tweak the column view and WPA can show the data transition in the various storage layers.  And then if you really want to blow your mind, go back to your PVS environment and set the RAM cache buffer to 0 MB and re-run the Recorder and Analyzer.  Then you’ll get a really clear picture on how we spillover to disk!

As I mentioned earlier, using WPA to debug PVS is probably not for everyone.  In fact, it’s probably not for 99% of our customers.  But the next time you think you have a performance problem related to PVS and ProcMon & Wireshark aren’t cutting it for you, this is a great tool to have in your bag of tricks.  It’s really the only way to understand what our PVS driver is doing at a low-level.  I hope you enjoyed this article and I’d like to give one last shout-out to Moso Lee – he deserves most of the credit for this article and we all should send him an email thanking him for giving us this new wC implementation!

-Nick

Nick Rintalan, Lead Architect – Americas, Citrix Consulting

Moso Lee, Software Engineer – PVS, Citrix Product Development