My latest adventure into density testing for Citrix solutions centers on XenApp (XA) Hosted Shared desktops. For those unfamiliar with the XA Hosted Shared model, it combines application and session virtualization, relying on a single OS instance on a Citrix XA server to publish familiar Windows desktops and applications. The Hosted Shared model centralizes application delivery and management (securing data and applications in the data center), and it scales well to support large user densities. It is designed to provide a locked-down, streamlined, and standardized environment with a core set of applications, making it ideally suited for task workers where personalization is less of a focus. While the XA server can be hosted directly on a bare-metal server on Windows Server 2008 R2, my recent testing explores the densities you can achieve when XA runs in a virtualized environment.
Hosted Shared desktops are ideally suited for task workers that require a static application portfolio, as opposed to knowledge workers that require a more broad or customized environment. Typically, Hosted Shared desktops are only one part of a multi-pronged approach to desktop and application delivery. Citrix’s FlexCast™ technology can customize delivery and tailor it to meet the performance, security, and flexibility requirements of each use case. You can view different FlexCast use cases at http://flexcast.citrix.com, which compares and contrasts different delivery models and supplies a tool to help you characterize site-wide configuration needs.
Why Virtualize XenApp Services for Hosted Shared Desktops?
There are advantages to deploying Citrix XA in a virtualized configuration. Running Citrix XA services and Windows Server 2008 in a hypervisor makes it easy to consolidate multiple server and application silo instances on a single physical machine. Consolidation helps to simplify management and optimize resource utilization — the traditional reasons why IT often pursues virtualization initiatives. Although XA is supported on Microsoft Hyper-V, Citrix XenServer, and VMware vSphere hypervisors, I selected Microsoft Hyper-V for the first round of tests.
The goal of my initial testing was to explore the user densities you could expect with the Hosted Shared model when it’s virtualized with Hyper-V. As the testing demonstrated, you can achieve acceptable user response times and good scalability when virtualizing XenApp services on Hyper-V.
This testing is part of ongoing research to define sizing guidelines and recommendations for internal, partner, and customer-driven deployments. I’ll post updates and additional blogs as this research continues, so stay tuned for further results from my work.
The test environment included these core components:
- HP ProLiant DL380p Gen8 Server. This dual-socketed server hosted two Intel Xeon E5-2680 processors clocked at 2.70GHz and 192GB RAM. The storage configuration featured an HP Smart Array 6 Gb/s PCIe 3.0 SAS controller and eight 10,000-RPM SAS disks set up as RAID 0+1 volumes.
- Citrix XenApp v6.5. Citrix XA was configured with the default HDX settings, which includes Flash Redirection disabled for server-side video rendering. The session resolution was set to 1024×768.
- Microsoft Server 2008 R2 SP1 with Hyper-V. The test configuration used Microsoft Roaming Profiles (MSRP), and locally cached profiles were deleted during the logoff process.
- Login VSI 3.5 (). Login VSI is a load generation tool for VDI benchmarking that simulates production user workloads. For this testing, I selected the default Medium workload to simulate the desktop activity of a typical knowledge worker. Login VSI generates an office productivity workload that includes Office 2010 with Microsoft Word, PowerPoint, and Excel, Internet Explorer with a Flash video applet, Java app, and Adobe Acrobat Reader.
For each test run, I followed this sequence of steps:
1) I used 10 Login VSI launchers, first verifying that they were ready for testing. I started a script that invoked PerfMon scripts to capture comprehensive system performance metrics, and then initiated the workload simulation portion of the test in which Login VSI launched desktop sessions at 30-second intervals.
2) Once the desktops were logged in, the steady state portion of the test began. During ramp-up and steady state, Login VSI tracked user experience statistics, looping through specific operations and measuring response times at regular intervals. Response times were used to determine Login VSIMax, the maximum number of users the test environment can support before performance degrades consistently.
3) After a specified amount of elapsed steady state time, Login VSI started to log off the desktop sessions. After all sessions were logged off, I stopped the performance monitoring scripts.
4) Lastly, I processed the Login VSI logs using VSI Analyzer and PAL, the PerfMon CSV tool, to produce the graphs and metrics that follow.
To understand the impact of VM configuration, including virtual CPU (vCPU) and memory allocations, I conducted several test iterations with different configurations. The optimal supported configuration proved to be 8 VMs with 4vCPUs and 16GB RAM per VM, which sustained a user density of 203 sessions. The table below shows the various permutations tested, including both supported and unsupported configurations. (Microsoft officially supports up to 4 vCPUs per VM. Out of curiosity, I tested configurations that exceeded this restriction but such a configuration should never be deployed in production.)
I also wanted to explore how two other configuration choices impacted user density:
- Intel® HyperThreading (HT) technology. This processor technology provides multiple virtual threads per core, increasing performance for highly threaded applications. I ran a test iteration with HT turned off, achieving ~23% less density.
- Hyper-V Dynamic Memory. Hyper-V allows you to share and reallocate memory automatically among running virtual machines. I reran the XenApp test defining 4GB of startup memory per VM, a 16GB maximum, and a 20% buffer. During the test run, memory usage consumed approximately 9GB per VM, and user density decreased by about 3% (with about a 1% deviation).
Based on these tests, turning on HyperThreading proves to be an advantage, while Hyper-V Dynamic Memory appears to be only a slight disadvantage for session density. Of course, you should always conduct your own proof-of-concept evaluation using a representative workload for your own environment.
Detailed results for the test run that provided the optimal supported user density are given below. In this test run, Login VSI launched 220 sessions and achieved VSIMax at 203 sessions.
The test configuration followed these general metrics:
- Storage read/write ratio
- Average: 7/93 reads/writes
- Max: 89/11 reads/writes
- Storage average IOPS
- Per Desktop Average: ~3.25
- Per Desktop Max: ~8.09
It’s important to note that the storage data above reflects all phases of testing (logons, steady state, and logoffs). While you can use these results as general sizing guidelines, remember that you must tailor every configuration to match specific workload types and user capacities.
The figure below shows Login VSIMax of 203 sessions.
Logical Processor Run-Time
This metric records utilization of physical processors in the host computer. The “\Hyper-V Hypervisor Logical Processor(*)\% Total Run Time” performance counter is more accurate than using the system’s “% Processor Time” counter because the “% Processor Time” counter only measures host processor time. The “\Hyper-V Hypervisor Logical Processor(*)\% Total Run Time” performance counter is the best counter to use to analyze overall processor utilization of the Hyper-V server. The graph show the exhaustion of CPU resources under peak user density.
Available MBytes is the amount of physical RAM, in megabytes, immediately available for allocation to a process or for system use. It is equal to the sum of memory assigned to the standby (cached), free, and zero page lists. If this counter is low, then the computer is running low on physical RAM. This diagram shows that the 192GB RAM configured in the server supplies sufficient memory resources throughout the test run.
Network Interface Bytes Total/second is the combined rate at which bytes are sent and received over the single server NIC, including framing characters.
Disk Queue Length
The Average Disk Queue Length is the average number of both read and write requests that were queued for the C: disk during the measured interval.
The Calculated IOPS metric graphed below represents the combined rate of read and write operations on the C: disk. This graph is followed by graphs that show the rate bytes are transferred to or from the disk during read and write operations respectively.
Hyper-V Dynamic Memory (separate test from above data, 197 VSI Max)
The statistics and graphs below describe the behavior of Hyper-V dynamic memory.
The Added Memory counter shows the cumulative amount of memory added to the VM over time. The table and graph below shows the overall statistics of each of the Added Memory counter instances. Avg and Max are the average and maximum values in the entire log
The Removed Memory counter shows the cumulative amount of memory removed from the VMs. The table and graph below shows the overall statistics of each of the Removed Memory counter instances. Avg and Max are the average and maximum values in the entire log.
The Average Pressure counter represents the average pressure in the VM. The table and graph below shows the overall statistics of each of the counter instances. Min, Avg, and Max are the minimum, average, and maximum values in the entire log.
The aim of this testing was to demonstrate what densities you can expect when deploying Citrix XenApp Hosted Shared desktops on Hyper-V. Virtualizing XA in this way provides both good scalability along with the advantages of simplified management and optimized utilization.