“Are we there yet?” For weeks I’ve felt like a kid on a long road trip who wonders when we are going to get there. In my case, however, the “destination” is the release of a new Cisco Validated Design (CVD) for Citrix XenDesktop 7. Over the summer I collaborated with engineers from Cisco, Microsoft, and EMC to design a solution and conduct scalability tests, sizing different configurations and seeing just how well they could scale. After that we extensively documented the architecture, configurations, and test results so that you’ll be able to replicate the solution more easily and deploy more quickly.

Cisco has published the end result of our efforts — a CVD describing an affordable and scalable VDI reference architecture based on Citrix XenDesktop 7.  You can download the full 400-page CVD here, but you may want to start with this blog, which will give you the highlights.

What’s significant about this CVD is that it is the first of several developed using the new XenDesktop 7 product architecture. Earlier XenDesktop and XenApp releases used separate deployment methods and servers to deliver hosted shared desktops and applications (RDS) or hosted virtual desktops (VDI). XenDesktop 7 follows a unified architecture that allows you to provision both RDS and VDI desktops and applications using the same interfaces and policies, which helps to simplify infrastructure deployment. The new XenDesktop design also includes substantial HDX enhancements that improve the user experience, especially for mobile devices such as laptops, tablets, and smartphones.

A Reference Architecture that Scales

The CVD describes a cost-effective virtual desktop infrastructure solution that scales cost-effectively from 500 to 1000 seats. The infrastructure is fully virtualized on Microsoft Server 2012 with Hyper-V and hosted on Cisco UCS B200 M3 blade servers and an EMC VNXe3300 storage array. Citrix Provisioning Server 7 manages desktop images for a mixed workload of hosted shared (RDS) and hosted virtual desktops (VDI), which is common for many customer scenarios.

The CVD outlines dense and scalable configurations that are ideal for small to mid-size deployments. The base configuration supports up to 500 users using a single chassis and four blades, with the compute and networking components occupying only 8 rack units. Adding a second chassis (with seven blades across both chassis) permits scaling to support as many as 1000 users, at the same time adding chassis redundancy. The EMC VNXe3300 storage system supplies cost-efficient, consolidated storage that accommodates the validated workloads.


Testing Methodology

To validate the solution, we captured comprehensive metrics for the entire virtual desktop lifecycle: desktop boot-up and user log-in to virtual desktops (ramp-up), user workload simulation (steady state), and user log-offs. To generate load within the environment, we used Login VSI software from Login Consultants (www.loginvsi.com) to generate desktop connections, simulate application workloads, and track application responsiveness. We used the default Medium workload for Login VSI 3.7, representing office productivity tasks for a “normal” knowledge worker.

For each test run, we started performance monitoring scripts to track resource consumption for infrastructure components (UCS hosts with Hyper-V, XenDesktop Delivery Controllers, PVS servers, StoreFront servers, the SCVMM server, SQL servers, AD servers, client launchers, and EMC I/O controllers). To begin the testing, we took all desktops out of maintenance mode, started the virtual machines, and waited for them to register. The Login VSI launchers initiated the desktop sessions and began user logins, constituting the ramp-up phase. Once all users were logged in, the steady state portion of the test began in which Login VSI executes an application workload that includes Microsoft Office, Internet Explorer with Flash, printing, and PDF viewing.

Login VSI loops through specific operations and measures response times at regular intervals. The response times determine Login VSIMax, the maximum number of users that the test environment can support before performance degrades consistently. Because baseline response times can vary depending on the virtualization technology used, using a dynamically calculated threshold provides greater accuracy for cross-vendor comparisons. For this reason, the Login VSI software also reports VSImax Dynamic.

We conducted testing consisting of single server and multiple server scalability tests, with a three-phased process targeted at the following goals:

  1. Determining single server scalability limits. This phase calculated Login VSIMax for each workload (RDS or VDI) on a single blade. Under each workload, user density was scaled until Login VSIMax was achieved, which was typically when CPU utilization reached 100%.
  2. Validating single server scalability under a maximum recommended load. This phase validated a given density level for a single blade. The maximum recommended density level is that in which CPU utilization reaches a maximum of 90-95%.
  3. Determining a workload mix and validating multiple server scalability. First we defined a ratio of RDS and VDI workloads based on the earlier single server scalability results. In subsequent testing we examined how the solution behaved under that mixed workload, first on a single blade and then on multiple blades for user densities of 500 and 1000 users. By configuring a mixed workload on each blade, we reduced the overall number of blades required in the solution while providing a fault-tolerant configuration (since a single blade supported workloads of both types). This enabled the most cost-effective overall configuration, especially for sites with a smaller number of users.

Main Findings

In the first phase of testing, we found Login VSImax for hosted shared desktop sessions (RDS) and then hosted virtual desktops (VDI) on a single blade.

Phase 1: Single Server Scalability, Single Workload

We started by testing hosted shared desktop sessions on Windows Server 2012 with XenDesktop 7. By testing different combinations of servers and vCPUs assigned to those servers, we found that the best possible performance occurred when there were adequate CPU resources available (i.e., the number of vCPUs assigned to the virtual machines did not exceed the number of hyper-threaded cores on the server). For Intel Xeon E5-2697v2 (“Ivy Bridge”) processors, 24 cores with hyper-threading enabled 48 vCPUs. To achieve the highest density, we configured eight Windows 2012 virtual machines with six vCPUs each.

Using this configuration, we launched 320 sessions to achieve a VSImax score of 299 (Figure 1) for the single scalability test of hosted shared desktops on a single Cisco UCS B200-M3 blade. CPU utilization was the gating factor.

Figure 1: Hosted Shared Desktops, Single Server Results

 We then looked at single server scalability for hosted virtual desktops running Microsoft Windows 7 SP1 (32-bit). As shown in Figure 2, we launched 219 sessions to achieve a VSImax score of 205 on a single Cisco UCS B200-M3 blade. Again, CPU utilization was the gating factor.

Figure 2: Hosted Virtual Desktops, Single Server Results

 In both the RDS and VDI single server tests, CPU resources were the limiting factor — which should be the case with any correctly configured scalability test setup. The single server scalability we saw was impressive, especially in comparison to earlier generation CPUs. Previously the same workload was benchmarked using dual Sandy Bridge E5-2690 processors, resulting in 30% (RDS) and 24% (VDI) less user density per B200-M3 blade. In testing with “Ivy Bridge” processors for this CVD, we could scale the number of users significantly higher before CPU utilization became an issue. The table below compares densities (represented by Login VSIMax) for similar single server test runs using Sandy Bridge and Ivy Bridge processors.

Use Cases for Single Server Testing

Intel Xeon E5-2690 “Sandy Bridge”

Intel Xeon E5-2697v2   “Ivy Bridge”

% Difference

RDS (Login VSIMax)




VDI (Login VSIMax)




Phase 2: Single Server Scalability, Mixed Workload

After testing single server scalability, the next step was to identify the best workload mix. By mixing workloads on a per-blade basis, we could reduce the total number of blades required while maintaining a diverse set of workloads — only one extra blade is then needed to achieve redundancy. Segmenting workload and use cases with a dedicated set of servers is easier with larger size solutions, but it is difficult to accommodate with architectures comprising only a few servers.

Based on the single server testing, we determined that the optimal ratio was 70% hosted shared desktops and 30% hosted virtual desktops. To support a total of 250 users per blade, we configured 175 task users that access 6 hosted shared desktop VMs along with 75 users on hosted virtual desktop VMs. In testing a single server with this configuration, VSImax was not reached, as shown in Figure 3 below. Under this load, CPU utilization reached about 90-95%, within the maximum recommended density guideline.

Figure 3: Mixed Workload, Single Server Results for Recommended Load

Phase 3: Multiple Server Scalability, Mixed Workload

From the mixed-workload, single-server results above, we can extrapolate larger configurations and validate scalability by conducting additional multiple server test runs. Given our single blade configuration for 250 users, we configured 2 blades for the 500-user workload and 4 blades for the 1000-user workload. By adding one blade for fault-tolerance, the solutions under test then become 3 blades for 500 users and 5 blades for 1000 users (excluding the two additional infrastructure blades).

The graphs below show test results for the 500-user and 1000-user configurations. In both cases, VSIMax scores were not reached, which means that performance never degraded beyond an acceptable level for the given number of users.

Figure 4: Mixed Workload, Multiple Server Results, 500 Users

Figure 5: Mixed Workload, Multiple Server Results, 1000 Users

Taking Advantage of Tiered Storage

The EMC VNXe3300 storage system provides cost-effective, centralized storage pools that can be allocated according to performance requirements. For our 500- and 1000-user scalability testing, user home directories and profiles were stored as CIFS shares on the EMC storage system’s 15k rpm SAS drives. Solid State Drives (SSDs) were used for PVS write caching because they provide high I/O performance. Configuring tiered storage allows the solution to offload intensive I/O for non-persistent data while maintaining adequate performance during ramp-up, steady state and log-off periods of the Login VSI testing, even for 1000 users, as shown in Figure 6 below. Note that Figure 6 shows IOPS measured during the 1000-user test, but represents solely the I/O used by the CIFS shares on the two storage processors (SPA and SPB) that provide user home directories and profiles.

Figure 6: EMC VNXe3300 Performance 1000-Users Login VSI

Beyond the Numbers

The full CVD contains a variety of performance metrics (CPU, memory, network, and I/O) for the 500- and 1000-user scalability tests under a mixed RDS/VDI workload. The results indicate close to linear scalability across the reference architecture — in other words, each additional blade server provides capacity needed to host another 250 users. Because the reference architecture scales so well, customers can easily deploy small, tactical RDS/VDI solutions and add more blades (in a building block approach) to increase scale.

This CVD is remarkable for a number of reasons:

  • The CVD is the first one to be released for Citrix XenDesktop 7, so customers that follow this architecture can enjoy a single set of tools to configure both RDS and VDI workloads.
  • The CVD is designed to meet the needs of small to mid-size businesses, with excellent scalability to meet growth. The defined configurations include additional blades for fault tolerance.
  • It supports a mixed workload per blade, which creates a highly cost-effective solution. In addition, local tiered storage — the EMC VNXe3300 storage array — optimizes storage-related costs for a 1000-user solution.
  • Because of the high-performance design of the Cisco UCS B200 M3 blade servers, the CVD demonstrated record densities for Citrix XenDesktop 7 with a mix of RDS and VDI workloads.

While it took many months and man-hours to bring this CVD to life, the results were certainly worth waiting for. Again, to read more about the solution, performance metrics, and test results, see the full CVD report here.

— Frank Anderson, Principal Solutions Architect with Citrix Worldwide Alliances