With the addition of Zones in XenApp/XenDesktop 7.7, I wanted to dig a little more into the impact of latency on brokering performance.

Both Craig and William have talked about zones already, but I’m going to dive into just one area: brokering performance with latency.

For the majority of end users, enumerating and launching resources is something they’ll do every day. With the addition of zones, we’re allowing users to be on higher latency links, so long as there’s a local broker.

With this additional latency, there will, inevitably, be an impact on end user experience. For the majority of work that users will do, they’ll see the expected slowness that’s linked to round-trips between the satellite brokers and the SQL database.

However, for launching Apps, there is a pain point in actually brokering sessions. This pain point is due to the need to pick the lowest-loaded VDA on which to launch an app. This occurs within a database transaction and needs a snapshot of all the current loads on the VDAs within the Delivery Group. To achieve this, a lock is taken out on all the workers in Delivery Group, which stops other users (IE causes serialization) from taking the same locks. It also waits on and blocks out worker state changes (eg. session events).

With low latency, the delay between taking the locks and releasing them is very small. However, as latency increases, so does the time the locks are held, and so the time to broker sessions increases.

To back this up, we’ve looked at a variety of latencies and launch rates. The latencies are the round-trip times (RTTs) and were based on Verizon IP Latency Statistics. Note that most of the RTTs are lower than the maximum values listed, but we wanted to make sure that we were testing with some useful RTTs.

Round-trip times of 10ms cover most inter-country delays. 45ms covers North America, Europe and Intra-Japan; 90ms covers Trans-Atlantic; 160ms covers Trans Pacific, Latin America and Asia Pacific; finally, 250ms covers EMEA to Asia Pacific.

We tested with a variety of concurrent requests, covering values from 12 to 60 in increments of 12.

Note: the VDA Sessions are simulated, as the testing is focused on the impact of latency on the broker. For this testing, there are 57 VDAs within one delivery group. Each test attempted to launch 10,000 users.

10ms RTT results
Concurrent Requests 12 24 36 48 60
Average Response Time (s) 0.9 1.4 1.6 2.1 2.6
Brokering Requests per second 14 17.8 22.9 23.2 22.9
Errors (%age) 0 0 0 0 0
Time to launch 10k users 11m57s 9m24s 7m16s 7m11s 7m17s

As expected, 10ms is fast enough to handle the loads placed upon the system. No errors were seen, and is the fastest way to launch users. At the maximum launch rate of 60 concurrent users, average response times were 2.6s, taking ~7m11s to launch all 10k users.

45ms RTT results
Concurrent Requests 12 24 36 48 60
Average Response Time (s) 1.7 3.1 4.3 6.4 7.3
Brokering Requests per second 7.1 7.8 8.4 7.5 8.2
Errors (%age) 0 0 0 0.01 0.01
Time to launch 10k users 23m28s 21m19s 19m51s 22m15s 20m19s

With 45ms, results were still good, at the very high launch rates, 1 or 2 users saw an error. Note: the impact of serialization can be seen on the response times, with an increase from 1.7s to 7.3s to broker a session. Total time to broker 10k users was 20-23m.

90ms RTT results
Concurrent Requests 12 24 36 48 60
Average Response Time (s) 2.9 6.4 9.5 12.9 16.2
Brokering Requests per second 4.1 3.7 3.8 3.7 3.7
Errors (%age) 0 0 0 0.01 0.01
Time to launch 10k users 40m30s 44m29s 44m11s 44m55s 45m04s

Again, 90ms results saw few errors. However, the impact of transacting over latency becomes more obvious with users seeing an acceptable average time of 2.9s to broker a session with 12 concurrent requests, increasing to likely unacceptable 16.2s to broker a session with 60 concurrent requests. In this case, it’s actually more advantageous to broker users at a lower rate. To log all 10k users on took 40-45 minutes

160ms RTT results
Concurrent Requests 12 24 36 48 60
Average Response Time (s) 5.7 11.4 17.3 23.2 28.0
Brokering Requests per second 2.1 2.1 2.1 2.1 2.1
Errors (%age) 0 0 0.12 4.0 17.7
Time to launch 10k users 1h19m0s 1h19m27s 1h19m55s 1h20m26s N/A

With the 160ms, we start to see significant errors occurring with higher launch rates, with 4% errors at 48 requests, and 17.7% errors at 60 requests, along with response times approaching 30s. However, up to 36 requests the error rate is 0.1% with an average brokering time of 17s. Note: it’s hard to judge the launch time for 60 requests, as 17% failure is hard to factor in.

With this latency, we’d recommend not passing 24 concurrent requests. Also the size of the site may also be a factor–logging 1k users in would take ~8m. This would scale up to 1h20m for 10k users. As such, we wouldn’t recommend a large site with this level of latency to the database.

250ms RTT results
Concurrent Requests 12 24 36 48 60
Average Response Time (s) 9.3 15.4 26.7
Brokering Requests per second 1.3 1.6 1.3
Errors (%age) 0 0 4.6 42.8 99.0
Time to launch 10k users 2h08m33s 1h46m52s 2h03m46s N/A N/A

With such high latency, a large number of timeouts occurred at higher concurrent launch rates. At 48 requests, 42% of requests failed, and 60 requests timeouts were so common that the site would be unusable, as 99% of requests failed. This rendered other data unhelpful as the average response time was of a few successful requests.

The only acceptable launch rates were 12 and 24 requests. It would be hard to recommend deploying a large site with this level of latency, as logging 1k users in took 13m with 12 concurrent requests, and 11m with 24 concurrent requests. 10k users would take up to 2h8m.

Throttling requests

If you do need to work with high latency, and find that too many time-outs occur, a registry key was added to XenApp/XenDesktop 7.7 to allow it to handle only a fixed number of concurrent brokering requests. Any requests above the limit will request that Storefront retries the request after a few seconds. This will help back off requests and, thus, reduce lock queuing. However, some users may end up seeing extended launch times, as they’re always unlucky and their requests is always backed off.

The key is a DWORD and should be stored in:

HKLM\Software\Citrix\DesktopServer\ThrottledRequestAddressMaxConcurrentTransactions

If the key doesn’t exist, then no limit on brokering requests is made. Note: the key is per DDC, so the total requests on the SQL server needs to be split amongst the remote DDCs.

Summary

Brokering does work over latency, but the latency needs to be considered for sizing a remote zone. If a zone is large, it may still be desirable to keep a database local to that zone. If the zone is small, using a remote zone may work well and might also reduce management cost without impacting on the end user experience.

Note that we recommend that your zones have less than 250ms RTT, beyond that you should consider setting up different sites.