Hi Everyone,

My name is Tony Sanchez (@TonySanchez_CTX) and I work on the WW Alliances teams as principle solutions architect focusing on the HP Moonshot platform.

Citrix XenApp is now available on HP Moonshot with Intel graphics!Previously, I wrote a blog on the first Citrix and HP Moonshot offering with AMD called the CS100 for Citrix XenDesktop. Since then the HP Moonshot and Citrix family has grown and now for the first time Citrix XenApp 7.5 is available on the HP Moonshot platform! XenApp is powered by the new ProLiant m710 cartridge which is driven by Intel E3 chipset with integrated Iris Pro Graphics!

Many may ask what is so unique about Moonshot and what makes XenApp so special for this new hardware platform? Those are great questions that I’ll answer in this blog. Earlier in the year at Citrix Synergy 2014 we demonstrated the Moonshot ProLiant m710 for the first time publically with our partners HP and Intel. If you didn’t get a chance to see the demo I highly recommend that you check it out.  There was an immense amount of interest from customers and partners and since Synergy we have been working around the clock testing and building our new offering to enable a solution that isavailable today.

What’s impressive about HP Moonshot and this new ProLiant m710 cartridge is how we can enable an out- of-the-box experience to leverage the Intel Iris Pro Graphics for each XenApp user –all without having to use a hypervisor. This bare-metal architecture simplifies the setup time and deployment for XenApp without waiting on the underlying management platform to be enabled or a GPU to virtualized. With all that in mind let’s take a look at the ProLiant m710 and its Moonshot components and also share with you the secret sauce that technical people will care a lot about, the scalability performance data for medium and rich applications workloads.

Compute and Graphics

The HP ProLiant m710 is the first Moonshot cartridge that is powered by the Intel® Xeon E3-1284Lv3 processor with integrated Iris Pro Graphics P5200 graphics. With integrated on board processing and graphics you can now easily deliver on demand graphics for rich applications with Intel’s Iris Pro and for intense number crunching leverage the Turbo boost technology with speeds up to 3.2 Ghz. Delivering graphics in the past for rich applications or content that call for OpenGL, OpenCL, and Web GL has always been a challenge when graphics cards were not present. Now with a solution like the HP Moonshot ProLiant m710 for XenApp, a user has on demand enablement of graphics for those scenarios. In the past, IT needed to schedule time to take down servers and insert a GPU to enable that workload. With a micro server architecture like HP Moonshot and ProLiant m710 cartridge those challenges fade away.

With such a powerful and compact cartridge it surprising in how little this cartridge uses for watts. The ProLiant m710 cartridge requires only 19 watts to power it on which is about what a regular appliance light bulb uses so hopefully the light bulb is going on in your head about how power savings can start to come into play. For those that want all the processor data additional information about the Intel E3 chipset can be found here and Intel has a great blog about you should read as well.

Memory

Each ProLiant m710 is enabled with 4 SODIMM Slots with 8GB of DDR3L-1600 low voltage memory per slot for a total maximum configuration of 32GB RAM per cartridge. While 32GB of RAM is a smaller amount of memory than a full size blade, it enables you to create micro XenApp instances similar to what you would use in virtualized VM instance of XenApp, but without the hypervisor. Previous scalability tests performed by Citrix show that many virtualized instances on XenApp on a hypervisor have around 8GB to 32GB of virtual RAM assigned so having 32GB of physical RAM is not too far off from the hypervisor world. Of course mileage may vary as some customers choose higher amounts of virtual RAM for applications which require it so keep that in mind. With the RAM being physically assigned to each cartridge it simplifies the design and scalability process as there is no need to worry about overcommitting virtual RAM and impacts it may cause.
Additional Quick Specs information about the ProLiant m710 cartridge can be found here.

Network

Do you have a need for speed? The ProLiant m710 cartridge delivers integrated 2x 10gb Mellanox ConnectX®-3 network adapters with which also support RDMA over Converged Ethernet or RoCE. These adapters seamlessly integrate into the Intel QM87 chipset on the ProLiant m710 cartridge. With two 10gb adapters, the transfer speed for users accessing content such as videos, high resolution images, and large files are instantaneous.

The HP Moonshot-45XGc Switch Module

The HP Moonshot-45XGc Switch Module is designed to provide high speed, low-latency connectivity while dramatically reducing cost and complexity of deploying solutions at scale. The 45G Switch Module, together with the HP Moonshot-4QSFP+ uplink module, provide 10GbE network connections to cartridges within the HP Moonshot 1500 chassis. Up to two switch modules are supported in each chassis. Multiple modules can be stacked to eliminate the cost of TOR switches and provide failover in the event of a switch or uplink failure.

More Quick Specs information about the 45XGc switch module can be found here.

Storage

Each ProLiant m710 is enabled with a Micron M500 120GB M.2 NGFF or Next Generation Form Factor NAND, drive for fast reads and writes for the operating systems while only consuming around 3.3 volts of power. Each M.2 also leverages the adaptive thermal throttling technology which allows the drive reduce its temperature if it’s being stressed. The maximum random reads for the M.2 is rate around 500 MB/s with the writes at 130 MB/s and a total MT/s of 6.0GB/s. The screenshot below shows a few SSD tests from Blackmagicdesign, CrystalDiskMark, and ATTO Disk Benchmark applications.

Scalability

Of course the question on everyone mind is “how does it scale”? As I mentioned earlier the performance data gathered from the HP Moonshot chassis manager, as well from Perfmon counters, are critical when analyzing the overall chassis and XenApp user experience. Performing scalability tests are mandatory to see how well a system handles from a steady state to fully loaded active state for XenApp. Each test gathered data from common areas such as CPU, RAM, IOPS, network and of course power. For this test Citrix and HP leveraged industry standard tools from Login VSI version 4.1 to help with creating a 2400 user synthetic test. We attempted to ask 2400 people on Facebook if they would participate, but everyone was busy so we created 2400 friends of our own to test ☺. There were two types of workloads leveraged for the tests, the medium and the rich application workload. An explanation of each workload and its applications are below. Each test was executed using 1, 15, 30, and 45 cartridge loads to ensure that a partially to fully loaded chassis was utilized. For this blog we will only focus on the 1 and 45 cartridge scalability numbers. All of the information below can be found in the technical whitepaper  HP for Moonshot for XenApp coming soon which I highly recommend that you read.

Medium Workload

This section describes the medium workload profile, used to evaluate XenApp performance on HP Moonshot with HP ProLiant m710 Server Cartridges. The medium workload is the default workload in Login VSI. This workload emulates a medium knowledge worker using Microsoft Office 2013, Microsoft Internet Explorer, PDFs, and Java/FreeMind.

Rich Application Workload

A separate application workload was designed to evaluate XenApp performance on HP Moonshot with HP ProLiant m710 Server Cartridges when running rich and graphics-intensive applications. This workload executes the GPU-enhanced features of Adobe Photoshop CC 2014 and manipulates a 3D model in eDrawings Viewer. The GPU-enhanced features of Adobe Photoshop include the Blur Gallery, Smart Blur, Upscale, Smart Sharpen, Lighting Effects, Rotate, and Scrubby Zoom, to name a few. The GPU uses OpenGL (Open Graphics Library) to render and accelerate 2D and 3D graphics and OpenCL (Open Computing Language) for parallel processing acceleration.

The rich application workload consists of three segments:
• Segment 1 runs OpenGL, OpenCL, and Zoom tests (via Adobe) in Photoshop CC 2014.
• Segment 2 opens a 3D assembly file and rotates, expands, and collapses the file multiple times.
• Segment 3 uses Scrubby Zoom to zoom in and out of an image multiple times.

The OpenGL test loads an image, applies Lighting Effects and Smart Blur, rotates the image, scales the image, applies Motion Blur, and finally applies Lighting Effects again. The OpenCL test loads an image, and then applies the Field Blur, Iris Blur, and Tilt Blur filters from the Blur Gallery. The Zoom test loads an image, and then uses Scrubby Zoom to rapidly zoom in and out of an image for 30 seconds. The image manipulated is a 17-megapixel, 48-MB TIF file. The 3D assembly file is a 63-component, 2-MB EASM file.

Single Cartridge Performance Data

Figure 4 shows the baseline response time vs. the number of user sessions for a single HP ProLiant m710 Server Cartridge in the XenApp delivery group. A VSI max of 50 XenApp users was achieved for this test.

Figure 4.

Figure 8 shows that the number of user sessions increases linearly as the number of cartridges increases, with response time remaining almost constant. This shows a high scalability of over 2,300 XenApp users in one chassis!

Figure 9 characterizes processor utilization of the HP ProLiant m710 Server Cartridge. When the number of sessions approaches 49 or 50, CPU utilization goes up to 100 percent; when the user sessions start to log off, CPU utilization goes down.

Figure 10 characterizes power utilization of the HP ProLiant m710 Server Cartridge. The maximum power rating on each cartridge is less than 75 Watts at peak load. However, with the XenApp medium workload, the power utilization is ~60 Watts per cartridge at peak load.

Figure 12 characterizes main memory utilization of the HP ProLiant m710 Server Cartridge. Notice that 32 GB memory is not saturated, even at peak workload (when the maximum number of user sessions is running). Also note that 6–8 GB of memory is free at peak workload.

Figure 13 characterizes network utilization of the HP ProLiant m710 Server Cartridge. Notice that the network does not create a bottleneck. Even at the peak workload (when the maximum number of user sessions is running), the network utilization remains below 130 Mb/s. The ProLiant m710 Server Cartridge has a 10 Gbps network.

Rich Application Workload

Figure 16 through Figure 19 show an indicative test from each data point in Figure 15. Note that the numbers do not match exactly, as Figure 18 contains averages across multiple tests. Figure 20 through 25 plot the system-level parameters— power consumption, GPU utilization, CPU utilization, memory utilization, and disk operations—during the execution of 100 percent rich application workload on the HP ProLiant m710 Server Cartridge.

Summary

Throughout all these tests the performance was very predictable and the results scaled linearly which provides validation and confidence that the HP Proliant m710 cartridge can handle a multitude of XenApp applications even up to intense graphical applications that leverage the GPU on demand. It’s important to note that while a single Moonshot chassis can easily accommodate 2,000 or more XenApp users, “mileage” may vary so performing your own tests will help you decide the scalability limits in your own environment. Of course all this great testing and information that is provided to you today couldn’t have been done without the help of some great guys at HP.

I would like to acknowledge Supreeth Venkatesh (@SupreethPSV) and Zach Bitting (@Zachbitting) for all their dedication on working on this project. You can reach the Moonshot solutions team at HP via @HP_MSE as well. We hope that this blog was informative and will allow you to see that HP Moonshot with XenApp can deliver and impressive user experience with break through economics. In my next blog we will look at the architecture building blocks for the infrastructure such as PVS, NetScaler, WDS, HP CMU, and other key components. It’s definitely been one small step for man, one giant leap for mankind with this new architecture so stay tuned for more…