While everyone using Provisioning Services (PVS) in a productive environment should be aware of the basic networking best practices outlined within CTX117374 or the design considerations at CTX125126, there are a couple of advanced configuration options that are not widely known. Today I would like to discuss some of these.

Subnet Affinity

The Subnet Affinity feature which has been introduced with PVS5.6, is an advanced load balancing algorithm that enables us building PVS blocks.
After enabling load balancing for a vDisk, the following Subnet Affinity options can be set:

  • None – ignore subnets; uses least busy server.
  • Best Effort – use the least busy server/NIC combination from within the same subnet. If no server/NIC combination is available within the subnet, select the least busy server from outside the subnet. If more than one server is available within the selected subnet, perform load balancing between those servers. Best Effort is the default setting.
  • Fixed – use the least busy server/NIC combination from within the same subnet. Perform load balancing between servers within that subnet. If no server/NIC combination exists in the same subnet, do not boot target devices assigned to this vDisk.

So now imagine you’re using Blade Servers, where two PVS servers as well as your provisioned virtual desktops or XenApp servers reside within the same chassis / enclosure. You would get something similar like within the diagram below:

What we can do now, is specifying a dedicated subnet for Provisioning Services traffic that spans all systems within that chassis. When configuring the Subnet Affinity for Best Effort in such a scenario, all PVS Targets within the chassis will be streamed by one of the two PVS servers. If one server goes down the other takes over. If the Targets perform the initial PVS logon with a server outside of the chassis, they will be redirected automatically to servers within their subnet for the actual streaming I/O. If both servers go down the targets will connect to a server outside the chassis.
Doing so allows using the high performance network connections between the blades (typically 10GBit/s per Blade) for streaming the vDisks and keeping basically all network traffic inside our PVS block.

Local and Remote Concurrent I/O limits

These limits can be set within every PVS server’s advanced properties dialog, of which many people are afraid of due to any reason. I’ve seen customers, partners and even Citrites telling everyone not to change any value in here as it will most likely break your PVS infrastructure. While this holds true for i.e. changing the MTU size (unless you really know what your doing), some of the configurable items can improve the streaming performance without a big risk. Two of these settings are the concurrent I/O limits for local or remote vDisk stores. As the PVS help is very informative for these settings I just pasted it here for explaining it:

“Controls the number of concurrent outstanding I/O transactions that can be sent to a given storage device. A storage device is defined as either a local drive letter (C: or D: for example) or as the base of a UNC path, for example
ServerName.

Since the PVS service is a highly multi-threaded service, it is possible for it to send hundreds of simultaneous I/O requests to a given storage device. These are usually queued up by the device and processed when time permits. Some storage devices, Windows Network Shares most notably, do not deal with this large number of concurrent requests well. They can drop connections, or take unrealistically long to process transactions in certain circumstances. By throttling the concurrent I/O transactions in the PVS Service, better performance can be achieved with these types of devices.

Local device is defined as any device starting with a drive letter. Remote is defined as any device starting with a UNC server name. This a simple way to achieve separate limits for network shares and for local drives.

If you have a slow machine providing a network share, or slow drives on the machine, then a count of 1 to 3 for the remote limit may be necessary to achieve the best performance with the share. If you are going to fast local drives, you might be able to set the local count fairly high. Only empirical testing would provide you with the optimum setting for a given hardware environment. Setting either count to 0 disables the feature and allows the PVS Service to run without limits. This might be desirable on very fast local drives.

If a network share is overloaded, you’ll see a lot more device retries and reconnections during boot storms. This is caused by read/write and open file times > 60 seconds. Throttling the concurrent I/O transactions on the share reduces these types of problems considerably.”

For typical PVS implementations within enterprise environments I set it to 40 right at the start and increase the value with increments of 20 until it breaks or I reached 0. As this obviously cannot be done within a production environment, it is a typical Build&Test / Pilot task.

When testing the feature keep in mind that PVS uses Windows System Cache for caching parts of the vDisk (please see http://community.citrix.com/x/gYpgCQ for further information). So when testing a increased I/O limit value the cache needs to be emptied (server reboot) as otherwise you will not see any difference immediately.