My role in the Citrix Alliance Group focuses on performance testing, especially for VDI and SBC reference architectures developed in conjunction with alliance partners. I spend the bulk of my time developing and validating reference architectures so that customers can deploy Citrix solutions quickly with less risk. While you might think my job  sounds like mind-numbing number-crunching, I get tremendously jazzed about it and the theory behind performance analysis.

Recently, I got to thinking about ways to extend performance measuring beyond evaluating the infrastructure components. I wanted to capture the overall user experience, and collect client and user data to augment other performance testing results. For this reason, I came up with a method of capturing user experience (UX) data while running workloads that focus on the hosted solution’s performance and scalability.

User Experience Statistics

To capture UX data, I run a series of scalability-related tests using a Login VSI Medium with Flash workload. For example, I run the same workload as I’ve done in previous scalability tests with similar user capacities. This workload simulates desktop activity of a typical knowledge worker, including applications such as Microsoft Office with Word, PowerPoint, Outlook, and Excel, Internet Explorer with a Flash video applet, a Java app, and Adobe Acrobat Reader.

The UX testing methodology differs from other methods of scalability testing. For example, for a 75-user test, I’d configure 74 virtual desktop sessions connecting to 10 virtual client launchers, along with one dedicated launcher that uses a physical system. This physical client system serves as the “canary in the coal mine,” collecting protocol response times as well as frame rate statistics. During the test run, the metric capturing process starts at the beginning of the test and lasts throughout the entire Login VSI run. I record screen-captured videos of the canary session to act as a visual aid.

The graphs below show data for three consecutive Login VSI workload loops. The purpose for the charts is to compare the first Login VSI test loop to the subsequent ones to gauge the user experience during the login/ramp-up and steady state phases of the test. You can clearly see matching patterns with only a small amount of deviation between each loop. Looking at the similarities, you can infer that each loop maintains the overall user experience.

75 User Test Run – Full test run (contains data for 3 consecutive VSI workload loops)

 

The above graphs are presented primarily for educational purposes since the best way to compare subsequent loops is to average the data per loop and observe similarities and differences. Another useful comparison is to overlap the charted data for each loop looking for noticeable deviations — later charts in this blog illustrate this method of comparison.

ICART and FPS Metrics

The above charts graph two metrics: ICA Roundtrip Time (ICART) and Frames Per Second (FPS). ICART is an internal EdgeSight End User Experience Monitoring (EUEM) counter that simulates a user typing or mouse clicking. It systematically sends fake keystrokes from the client that transverse the ICA stack, making a full roundtrip loop through the Virtual Desktop Agent (VDA) and back to the point of origin — the client. For example, an ICART value of 500ms equates to one-half of a second typing response lag (a text character is sent and then received via the session as a bitmap). (More details about ICART and other EdgeSight EUEM metrics are available on the Citrix EdgeSight reporting community.)

The FPS counter calculates the transmitted frames encoded by the host and sent to the clients. Note that the captured data reflects the transmitted frame rate as opposed to received frame rate, so the FPS readings can vary a few frames from second-to-second. However, transmitted frames always reach the clients and average out over time.

If the user experience degrades significantly, these two counters will detect it. There are a few scenarios where they can be quite helpful:

  • The virtual desktop host or XenApp server’s CPU reaches 100% utilization for a sustained period of time (e.g., a few seconds or more). This adds latency to ICART and affects the FPS by slowing down the bitmap encoding
    process.
  • Network connectivity reaches its capacity limit and is adding latency between the virtual desktops or XenApp server and the client. This adds latency to ICART and affects FPS, slowing down the rate at which frames are transferred to the client; some frames are dropped by the virtual desktop using HDX Queuing and Tossing, a key optimization in Citrix’s HDX broadcast technology that improves UX, especially over wide area networks.
  • The storage infrastructure used for virtual desktops or XenApp server reaches its limit. Excessive disk latency from scenarios involving a lack of available IOPS, among other reasons, can be observed in the UX counters.

Establishing a Baseline

The test harness uses a dedicated virtual desktop with a physical client for the baseline. This “canary” client is the first to login and start the Login VSI workload. While running the workload, the test rig records UX observations (ICART and FPS), taking readings throughout the complete test run including login and steady state phases. The baseline represents single session performance without any other sessions on the server and captures metrics for one complete VSI workload loop.

Single-User Baseline Test with Login VSI Medium Workload

The baseline then becomes a point of comparison for subsequent Login VSI workload loops for specific user capacities, as shown in the graphs below. These graphs compare ICART and FPS metrics for the baseline and workload loops of 25, 50, 75, and 100 users respectively.

Supporting Test Video

To validate the collected metrics, I also record videos of the baseline and subsequent workload test loops. For example, in UX testing for a VDI-in-a-Box reference architecture, this video compares the user experience for the baseline and the last section of the 100-user test run. The video corroborates the UX metrics, illustrating that the UX of the fourth VSI workload loop is much the same as the baseline. You can read my blog about UX testing for this solution here.

Screenshot of test video 

Summary

My goal in UX testing is to validate the scalability of XenDesktop, XenApp, or VDI-in-a-Box solutions and corroborate other test results from the vantage point of a user. While my toolset is not available so that you could precisely replicate my methodology, there are many commercially available tools for collecting UX data, including offerings from Citrix EdgeSight, Scapa, LiquidWare Labs, and even Login VSI (in the most recent release), among other vendors. Note that I’ll continue to evolve my UX testing methodology as I further vet this new approach.

In the meantime, the UX metrics and videos give me a way to crosscheck collected VSIMax scores and PerfMon data that I collect during my scalability testing. Both supply valuable insight into what the user sees, providing a qualitative visual reference along with quantitative data — allowing me to confirm configuration performance that I’ve observed during other scalability testing.

References