PVS Internals #4: vDisk Stores and SMB3

At Citrix Consulting, we are often asked about what can be optimized in terms of Provisioning Services when using it with SMB3 file shares as vDisk store. There are already some articles about this out there, such as this blog post about Provisioning Services 5.x and SMB 1/2, as well as some other third-party blog posts. Unfortunately, most of these articles are several years old. Are they still valid in 2018 and do they also apply to newer versions of Provisioning Services and SMB? Let’s take a look.

First things first

With Provisioning Services, you can either have local stores where each vDisk file is available locally on each Provisioning Server (with manual or automated replication), or centralized stores based on a file share where each vDisk file can be accessed simultaneously from multiple Provisioning Servers. In this post I’ll not discuss which store type is the best (or even worse, which is “Best Practice”), because it depends on so many factors. This post is about centralized stores based on SMB shares, and in the end I’ll name some circumstances where this store type is feasible. If you desire guidance on the proper store type, check out this article from Martin Zugec.

The caching principle

In many of the articles we’ve read so far, we’ve learned that caching is key to make Provisioning Services successful. If you have already read them, perfect. If not, I’ll do a quick recap. Each vDisk block that is streamed to a target device needs to be read from the store. This can put high load to the storage, depending where the vDisks are hosted. Caching helps reduce storage access for subsequent operations. Each block that has been read once will be stored in memory, the Windows cache, and as such each subsequent access to an already read block will be served from the Provisioning Server’s memory. This happens transparent to the applications, and from the Provisioning Server’s point of view, it still looks like the vDisk is being read from storage. Once the Provisioning Server is “warmed up”, that means all frequently accessed vDisk blocks are in the cache, the actual storage access will be dramatically minimized, providing the best possible streaming experience and decoupling streaming performance from storage performance. Martin Zugec explained, in great detail, caching in this excellent blog post. Caching works out of the box for local files (i.e. files that are read (buffered) from a locally attached volume such as a NTFS-formatted disk). We also call this the file system cache. In this article, however, we are talking about SMB, or remote storage…

The oplock dilemma

Earlier versions of SMB introduced a concept called ‘opportunistic locking’, or “oplocks,” for short, but let’s do a short recap.

Think of oplocks as a kind of ‘exclusivity agreement’ between the client and the server that allows the client to cache read and/or write operations, thus reducing network activity and sensitivity to latency.

For example, a client opens a document from an SMB share in read/write mode and requests an oplock. The server checks if there are already other oplocks active for the file and if not, the request from the client is granted. Now the client can leverage local memory for read and write caching when working with the network-based file because it can be sure that no other client reads or writes that file. This type of oplock is called Batch oplock and it’s exclusive for a single client.

If multiple clients must access the same file, Level II oplocks will be used. They allow for read caching only. Think of it as a ‘promise’ from the server, that the file will not be altered while the oplock persists, to the client can safely cache data locally without permanently having to come back to the server to read data that has already been read.

Unfortunately, there are downsides with oplocks. For example, a client has read/write access to a shared file and the corresponding batch oplock is in place. If the client fails and another client wants to have read/write access to the same file, this type of access will not be granted by the server until the initial oplock from the first client has expired. Translating this to the Provisioning Services world: if one PVS server holds read/write access to a file and fails, another server will not be immediately able to access the same file in read/write mode.

In the past, the choice was between having oplocks enabled: perfect caching, but lacking failover, or having oplocks disabled: no caching but instant failover. Choose your poison! The PVS product team decided for better failover, which came at the cost of efficient caching. During product installation of earlier PVS versions, oplocks were disabled on the PVS server.

As Dan already outlined in his article, this default behavior did not fit in modern PVS environments. Read/write access to the store, and thus the requirement of quick failover, is only required in two scenarios: first, when using write cache on the server and second, for Private Mode vDisks or Maintenance Mode versions. As for the first scenario, we always recommend against using write cache on the PVS server because of bad performance, the huge network impact and massive HA requirements. In 2018, I think we can safely recommend “Cache in RAM with overflow to disk” as the no-brainer cache type for almost every PVS deployment. And even for Private/Maintenance Mode vDisks there is no reason to omit caching just to have better HA for these vDisks. In fact, we have recommended dedicated development stores for some time and such stores have generally lower HA requirements as production stores.

So, in a typical modern PVS environment we always prefer proper caching for vDisk files over failover-ability for files in read/write mode. With this changed priority in mind, the oplock deactivation configured by the PVS installer was no longer recommended, and after applying the ‘tuning recommendations’ from Dan’s article by re-enabling oplocks again, caching was basically restored to the Microsoft defaults. Thankfully, the PVS installer from Provisioning Services 7.9 onwards did no longer disable oplocks, so caching worked out of the box.

Traditional oplocks, however, were only existent with SMB 1.x and 2.0. Microsoft decided to change how things work in the new SMB 2.1 world…

Leases to the rescue?

SMB 2.1 was introduced with Windows 7 and Windows Server 2008 R2, and it came with improvements in terms of oplocks. Actually, traditional oplocks have been replaced with so-called leases. There are three types of leases:

Read Caching
Write Caching
Handle Caching

On a first glance, they work similar to oplocks, but they allow for more flexibility. Leases offer enhanced file and metadata caching abilities, and they limit the amount of data between client and server which in turn reduces file server load and increases scalability. Contrary to oplocks, leases cannot be disabled on the SMB client by setting the ‘EnableOplocks’ key to zero. They are enabled by default if both client and server support SMB 2.1 or higher. Leasing may however be disabled on the file server / filer, then the client will fall back to oplocks. Please don’t do it. Leasing, by the way, also works with the ‘suboptimal’ oplock-deactivating registry keys in place because those keys don’t affect SMB 2.1 or higher. I’d still not recommend keeping those keys.

SMB3 as the latest SMB incarnation continues to offer all these leasing benefits and adds Directory Leasing as new feature which further improved the whole caching ecosystem.

So, in theory, when using SMB 2.1 or even better SMB 3.x, caching should work out of the box and no ‘tuning’ should be required, correct? Well, let’s have a look under the hood…

A Deeper Look at the Protocol

I’ll uncover the details of streaming vDisks from SMB file shares. I’ve taken all traces directly from both Provisioning Servers in my lab farm via port mirroring.

Here is a quick overview of my lab environment:

2x PVS 7.15 on Windows Server 2016 (IP: 192.168.100.203 and .204)
1x File server on Windows Server 2012 R2 (IP: 192.168.100.102)
8x target devices
Dedicated Streaming Network
One vDisk (VHDX format) with XenApp 7.9 VDA and Server OS workload hosted on the aforementioned file server and assigned to all eight target devices.

The first thing I wanted to know was how protocols are negotiated and if PVS really leverages the latest and greatest SMB3. So, I powered on my first target device, and immediately after the bootstrap completed, vDisk streaming was initiated. Wireshark showed first SMB activities:

SMB2 negotiation traffic can easily be filtered by its command ID which is 0x00. Additionally, I’ve included SMB (SMB 1.x) traffic because each negotiation starts with the lowest supported version and the first packet would not be visible if it was just filtered for SMB2. Negotiating SMB 2.1 and higher typically involves two pairs of request and response, this process is called Multi-Protocol Negotiation.

In the first step the client requests all SMB dialects it is able to understand.

There is a lot of legacy stuff in this list you’ll probably not use anymore, but the most interesting part is the last dialect, SMB 2.??. It’s basically a placeholder for newer dialects and the client, in our case the PVS server, is effectively telling the file server that it’s able to understand dialects newer than SMB 2.002.

The server responds with the highest dialect supported, and since my file server is able to do SMB3, the negotiation response includes the mystic SMB 2.???. You may also notice that from this moment on, WireShark already tags packets as SMB2 and no longer as SMB.

Now that the client knows that the server supports even more modern dialects, it sends another negotiate request to the server. You know, the infamous Multi-Protocol Negotiation. This time the request includes all supported SMB2 dialects. Since my PVS server is based on Windows Server 2016, it supports all dialects up to SMB 3.1.1 and offers each dialect.

In case you are wondering why Wireshark still shows “Server Message Block version 2”, the reason is quite simple. What we broadly call SMB3 is technically an evolution of SMB2. In fact, SMB3 was developed under the code name SMB 2.2, but Microsoft changed the namelater to better emphasize the major changes and features that come alongside with the new protocol version, such as SMB Direct or SMB Multichannel.

Another indication that SMB2 and SMB3 share common technical ground is an article from Microsoftexplaining how to disable SMB versions. You can enable and disable either SMB1 or SMB2/3, but not SMB2 and SMB3 separately.

In case you didn’t know, these are the minimum required Windows versions for each dialect.

OK, back to the negotiation. After the PVS server offered its supported dialects, my file server replied with its highest supported dialect.

Guess what, my lab file server is a Windows Server 2012 R2 machine as it supports only SMB 3.0.2. Note to myself: consulting life is not always compatible with constantly keeping lab machines up to date. 🙂 For this demonstration it’s still fine since the differences between SMB 3.0.2 and SMB 3.1.1 are rather low and the outcome of this article would not change in any way.

Essentially, we learn that PVS leverages the latest protocol version without modifying any registry key or doing other fancy tuning stuff.

By the way, Windows Server 2012 and newer allows to uninstall SMB 1.x, and if you do this, negotiation will start right off with the second negotiation pair (in my example this would be packet #98607) negotiating between the SMB 2.x and SMB 3.x dialects. I haven’t uninstalled SMB 1.x on my lab PVS servers to show you the full negotiation process, but in production I’d always uninstall SMB 1.x. Not only does it save you two packets during negotiation, SMB 1.x is also vulnerableon some servers. There is really no single reason to leave the SMB 1.x optional component enabled on a Provisioning Server.

Back to the analysis, the next insights are about leasing. While SMB communication begins with negotiation and continues with some other protocol stuff I don’t further outline here, at some point the actual file is being accessed and I want to learn about the caching abilities that PVS benefits from. In my trace I filtered for SMB lease traffic, and it’s clearly visible how the PVS server opens the vDisk file on the share.

Let’s have a deeper look into this communication, more precisely the “Create Request File” request and the “Create Response File” response, whose primary task is basically to agree the modalities of the file access. First the request.

By looking at the request from the PVS server to the file server, we first notice that the oplock type is 0xff which essentially translates to “use modern leases, no oplocks”. We also see the vDisk filename, and at the bottom of the screen we see the client requesting all three types of leases: read-caching lease, handle-caching lease, and write-caching lease.

Let’s take a look at the file server’s reply.

The server has granted all three requested caching leases. This is, because no other client was accessing the vDisk file at the same time. My environment consists of two PVS servers and eight target devices, but I had just one target device powered on at that time. And even though Provisioning Services opens the vDisk file (VHDX) in read-only mode, the client always tries to get the “maximum” lease capabilities, which means: read, write and handle. In addition, the reply contains a “Lease Epoch” set to 1. We will come back later to this.

In the next step I checked what happens when multiple PVS servers are accessing the same vDisk file. I powered on the second target device hoping that the PVS load balancing directed me to the second PVS server in my farm, and indeed, that happened. So let’s have a look at the following SMB communication.

This trace was recorded via port mirroring so it shows traffic from and to both PVS servers. Again, I’ve filtered for all lease-related SMB traffic. The first request (#98688) is the initial vDisk file access request from the second PVS server to the file server. It looks very similar to the earlier “Create Request File” request from the first PVS server.

Since the second PVS server does not know if other clients are currently accessing the same file, it requests the same three lease types as the first PVS server.

Now it’s getting really interesting. The next SMB packet (#98691) is sent from the file server to the first PVS server which is currently the only client accessing the vDisk file. Since the second PVS server wants to access the same file, the file server needs to notify the first PVS server about the changed modalities. More precisely, the first server can no longer do write caching since now at least two clients are accessing the same file. The response is the so-called “Break Response” telling the first PVS server to request a lease change from read, write and handle caching to read and handle caching only. You also see that the lease epoch changes to “2”.

Now that the first PVS server knows it will no longer be the only client accessing the vDisk file, it sends a “Break Request” to the file server and hereby requesting the mandated lease change:

The file server responds with a “Break Response” containing the updated lease and the new caching modalities. The first PVS server is now no longer allowed to perform write caching, which the PVS server doesn’t need anyway since it only offers read-only vDisks in Standard Mode.

Finally, now that the first PVS server promised to no longer do write caching, the file server responds to the second PVS server’s initial “Create Request File” request (#98688) with a “Create Response File” response (#98731) and a proper lease.

Guess what, this lease is for read caching and handle caching only right from the start.

The really nice benefit of the lease solution is that even when the lease changes from read/write/handle to read/handle only, the read cache will not be flushed since both lease states allow for read caching. This is an enhancement over traditional oplocks in SMB 1.x and 2.0.

So, from a protocol perspective we are fine. Everything seems to work as desired. We don’t need to adjust any registry keys to allow for proper caching– it works right out of the box!

Caching – uncovered

One should think that caching works perfectly since we just validated the protocol side, but in the next step I’d like to take a deeper look at network utilization and the cache itself.

I’ve performed following test on both Provisioning Servers, but since the results were quite similar, I’ve decided to show just the results from one server. On each PVS server I’ve recorded following PerfMon counters using the Data Collector feature:

Network Interface\Received Bytes (LAN Interface)– This counter indicates incoming traffic on the PVS server, in this case it comprises the vDisk data loaded from the file share. Even though this interface is also responsible for all other traffic such as domain, GPO, etc, this additional traffic is negligible. I did not measure ‘Sent Bytes’ on this interface since all vDisk operations are read-only.
Network Interface\Sent Bytes (Streaming Interface)– This is the traffic sent from PVS server to target devices. As mentioned earlier, I’ve been using a dedicated streaming network for this test, so you can expect this counter to only deliver the actual vDisk streaming data.
Cache\Copy Read Hits %– This counter indicates the percentage of cache hits, i.e. if a page request can be served from cache. The counter can obviously hit up to 100%.
‘Combined (Sum) Cache Memory’– This isn’t actually a single counter, furthermore it comprises of the two standalone counters
- Memory\Cache Bytes– Cached data on the active list
- Memory\Standby Cache Normal Priority Bytes– Cached data on the standby list

The sum of both counters (plus three other counters which are not relevant for this test) equivalents the ‘Cached’ memory value in Task Manager and can be considered as ‘data in the cache’.

For my first test, I started three target devices, one after the other. Between each start I’ve waited around one minute and ten seconds. Let’s have a look at network utilization and cache hits.

The first thing we notice is that during the first boot, counters for ‘Received Bytes’ (red line; from the network file share) and ‘Sent Bytes’ (blue line; to the target device) look pretty much the same. That makes it very clear that without proper caching in place, network traffic occurs twice and is basically doubled. The counter for ‘Copy Read Hits %’ (grey area) shows just few spikes, and never really hits towards 50% or more.

This changes when the second target device starts. We still notice some traffic from the file share to the PVS server, but most of this traffic consists most likely of post-boot action on the first target device, such as starting services. The counter ‘Copy Read Hits %’ almost never goes below 50% showing increased cache activity. And finally, when the second target device enters multi /IO mode, cache benefits really start kicking in. This is when the blue line for ‘Sent Bytes’ hits up to 60 Mbit/sec. Instead of being limited to storage speed (i.e. file share performance), PVS can now serve nearly all required vDisk content from the fast memory cache which allows streaming traffic to utilize the full link speed. The counter ‘Copy Read Hits %’ is now hitting 100% almost continuous and emphasizes my theory.

When the third target device starts the cache can safely be considered as ‘warmed up’. Streaming traffic hits a new maximum (65 Mbit/sec.), the counter ‘Copy Read Hits %’ sticks with 100%, and the whole boot process took not even half as long as for the first target device.

Let’s take the same period of time and have a look at the cache counters.

The diagram shows the cache growing until the first target device completes its entire boot phase. And while the second target device already benefits from the warmed up cache, it gets finally clear on the third target device as the cache in no longer growing.

In case you were wondering how much memory is required for proper cache sizing, Martin Zugec already gave some guidanceon this topic. The set of counters I’ve used in this example can be beneficial as well to verify cache utilization in an existing environment or in a pilot/proof of concept. It’s very obvious that my example workload (Windows 2012 with RDS, XenApp VDA 7.9) utilizes roughly 900 Mbyte for caching, and this is just the plain boot process including things like GPO and domain processing – but without any user and application activity.

In my next test I put more stress on the environment as I started five more target devices in addition to the first three devices. The results are not very surprising.

Streaming throughput reaches new heights as vDisk content is served almost entirely from memory. There is nearly no network activity towards the file share. The result would also look similar if we’d start even more target devices.

If we look at the cached bytes during this period there is also no surprise.

Cached bytes are growing very slowly but most data can be served from cache.

Rebooting all devices

Next, I decided to reboot all eight target devices. Voila:

Again, I’ve reached a new streaming throughput maximum since my PVS server’s memory (and so the cache) is obviously faster than my network. (Hint: unless you are using very old hardware, this is usually the case :-). Caching seems to work perfect and PVS still doesn’t seem to touch the file share.

Let’s have a look at the cache counters, and this time I’m not only showing my combined ‘Cache Memory’, but also its single elements ‘Cache Bytes’ and ‘Standby Cache Normal Priority Bytes’. You’ll notice why:

The sum of ‘Cached Bytes’ always stays the same, but as I shut down all target devices, cache bytes were moved from the active ‘Cache Bytes’ to the ‘Standby Cache Normal Priority Bytes’. This is an expected behavior since no target device requires these cached bytes right now. You can also see that during reboot these bytes were moved back to the active list when target devices were re-requesting vDisk data. This cache change happens in-memory so no further reads from the network share are necessary.

File open vs. file close

It’s very important to mention that the file handle from the Provisioning Server to the vDisk VHDX file has never been closed during this test. While all target devices were rebooted the same time, i.e. the reboot command was sent synchronously, there was always a little deviation during that process which prevented the ‘connected devices’ counter in PVS dropping to zero. In my last test I’m showing what’s happening when there is a longer break between shutdown and restart of target devices. I’ve shut down all eight devices, waited around two minutes, and finally started all eight devices at once. Here are the results.

It seems the cached data was gone since there was again heavy network traffic between the file share and the PVS server. But still the cache got filled quite quickly since we immediate see good ‘Copy Read Hits %’ values, and streaming throughput was still much higher than data requested from the file share.

So, let’s take a look at the cache values.

There are three important things to notice here. First, after all devices have been shut down, the active list merges into the standby list, just as seen in the previous test where I’ve rebooted all target devices at once (#1). Then, roughly ten seconds later, PVS closes the file handle to the vDisk, and all cached vDisk data gets flushed immediately (#2). I’ve booted all devices two minutes later and that behavior is well-known: The cache is filling up again and subsequent target devices can benefit from it. But why does the cache get flushed in this case?

File System Cache vs. Redirector Cache

Accessing files on a network share always involves the so-called redirector. Think of the redirector as an interface between the client and the file server’s file system. It basically allows accessing remote files over the network using a network file protocol such as SMB. The redirector is responsible for caching and can do so as long as the file is opened and the lease allows for caching. When Provisioning Services closes the file, the lease is being removed and thus, the PVS server can no longer cache data.

On the other side, when leveraging local vDisk stores based on block devices such as an NTFS-formatted drive, there is no redirector involved. Instead, the read file blocks are cached as part of the file system cache. This mechanism works regardless of the file state. Even if files are already closed, they may persist in cache, unless they are changed, moved, or deleted.

Does this mean the file system cache is more efficient than the redirector cache, and especially for Provisioning Services, are local vDisk stores better than shared stores on file servers or filers? Well, I’d say the downside of redirector cache being flushed on file close does not really hurt in production environments since target devices are typically powered on, and the cache persists as long as at least one target device is booted, which is the case most of the time. And even if all target devices need to be shut down, this typically happens during server maintenance (e.g. Windows Updates) where a server restart would also empty the file system cache. So I don’t see any serious disadvantages of shared stores compared to local stores in terms of caching.

Nevertheless, even with proper caching in place, you should not underestimate the requirements of file share storage, especially in terms of high availability. The cache won’t help in case of an outage, so plan for high availability, for example by implementing a file services failover cluster or leveraging redundant filer heads.

Summary

So, in case you just stepped in at this point (and of course also if you spend the last minutes reading all the text above), here are the key takeaways:

Using file share storage for vDisks is still a valid and recommended approach, if sufficient capacity and high availability is in place. Especially in environments without redundant resp. highly available file services, I’d rather recommend local replicated vDisk stores instead of using just a single file server that definitely constitutes a single point of failure. And no one needs to invest in expensive file clustering solutions if its only purpose would be to provide file shares for vDisk storage.
Caching will help to significantly reduce the load on file servers or filers. Unlike earlier versions, there are no tuning requirements, neither for the PVS servers, nor for file servers. The defaults are perfectly fine. Sizing guidelines for PVS server memory (RAM) from past articles still apply.
Consider the caching behavior of the SMB redirector. There is a chance that shutting down all target devices connected to a particular vDisk will also remove the vDisk’s cache entries and the cache will need to be warmed up again on the next boot.
Leverage the latest SMB protocol if possible, at the time or writing this article it’s SMB 3.1.1. Not only Windows file servers support SMB3, but also modern filers such as NetApp, and even my home lab Synology is able to support SMB3.
Leasing is key for caching, so don’t use SMB 1.x or 2.0 (in fact you shouldn’t even enable the optional SMB 1.x feature on your PVS servers), and forget about any ‘oplock tuning’. Anything from SMB 2.1 onwards is fine with its defaults, but SMB 3.x brings added features that might be beneficial for you, such as ODX when taking copies of a vDisk file.

I hope you enjoyed reading this article. Leave any comments or questions below. I’d like to thank Martin Zugec and Saša Petrovic for helping me with this blog article, and especially I’d like to thank my customers, partners, and colleagues who have been asking me again and again to write about this topic for a long time. 🙂

Citrix TechBytes – Created by Citrix Experts, made for Citrix Technologists! Learn from passionate Citrix Experts and gain technical insights into the latest Citrix Technologies.

Click here for more TechBytes and subscribe.

Want specific TechBytes? Let us know! tech-content-feedback@citrix.com

Topics

Products