Overcoming Latency to Serve a Global User population

This month, Citrix introduced a preview of Adaptive Transport for HDX, applying selected techniques previously available only with Framehawk to all ICA virtual channels to accelerate performance while maintaining bandwidth efficiency and low TCO. This article explores how enlightened data transport fits into an end-to-end latency reduction approach that lets users enjoy a snappy, delightful and productive experience with Cloud-hosted apps and desktops regardless of their location.

Consider the following scenario. You work in the IT department of a global organization. You love XenApp and XenDesktop because workspace virtualization simplifies management and is much more secure than running apps locally on every user’s device. But you’re wondering, is it possible to deliver apps, desktops and data across a continent or ocean and still give end users a responsive, delightful experience that keeps their productivity level high? Or is the only solution to have at least one data center on each continent? What’s the right balance between user experience and cost for my organization? Is there a win-win?

There are many dimensions to a satisfying in-session user experience with virtual apps and desktops. Visual quality is important, especially for text and tiny graphics like Windows systray icons. Consistent response times matter, too, as a steady cadence improves productivity. And the most important factor is end-to-end roundtrip latency. By that, I mean the elapsed time between a user’s input (mouse click or drag, screen touch, or keyboard press, for example) and the resulting update on their screen. And not just the first pixel update (“click-to-photon”), but content that matters to the user.

Mitigating Network Latency

Network latency (ping time) – largely a function of distance – is the most obvious component of the total latency that impacts user experience. It can be reduced by minimizing the number of hops between the user and the data center, as every device and switch along the way adds some buffering and delay. Ultimately, we reach the limits of the speed of light, and even our brightest scientists haven’t yet figured out how to change that! But there are techniques that can, in a sense, beat the speed of light.

We can provide immediate mouse click feedback, a “zero latency” technique that immediately changes the cursor to an hourglass, followed by the actual application feedback when it arrives over the network. This instant response encourages the user to wait briefly instead of clicking a second time, which could have an undesirable effect.

A gearing mechanism that measures network tension can maintain responsiveness even in the presence of packet loss. This is one of the patented technologies that we use in our unique Framehawk virtual channel (see my blog post). It’s the computer science equivalent of measuring the tension on a bicycle chain and it enables Framehawk to immediately detect an increase in packet loss and compensate for it.

HDX is also very clever about how it paints the screen, doing so in a way that’s pleasing to the eye. On low bandwidth connections, we can deliver images initially with lossier compression to get them on the screen quickly, and then improve image quality while the user is viewing the results.

But It Isn’t Just Network Latency

There are many other components to end-to-end latency besides network latency. Happily, there are technologies that can minimize how much delay these processes add. Graphics rendering takes some time, and using a graphics processor (GPU) rather than the CPU to render 3D graphics accelerates that operation dramatically. Capturing the rendered graphics from the frame buffer takes time; fortunately, APIs like Microsoft DDAPI, NVIDIA GRID, Intel Iris Pro and AMD RapidFire can reduce that. Encoding the screen image for transmission adds more delay; hardware-accelerated codecs and efficient software algorithms can help with that. Synchronization (vSync) adds latency; some apps depend on it but in many deployments it can be turned off to speed up the user experience with little negative impact. Hardware accelerated decoding technologies like HDX SoC, DXVA and VDPAU, help reduce end-to-end latency, too.

Compression, Caching and QoS

Latency can be minimized by reducing the amount of data that needs to be sent over the network to update the user’s screen. Compression and caching enable smaller, less expensive network pipes to be used. To deliver a highly interactive experience over a narrow pipe or one that is shared with other competing data streams, we need to send as little data as possible to accomplish the desired visual updates. The challenge is to use encoding techniques that are also CPU-efficient or able to leverage hardware acceleration. HDX uses Adaptive Display technology and a SuperCodec to intelligently identify different types of screen content and use the most appropriate encoding technology. By default, H.264 is used selectively, for transient screen regions (video), and other codecs are used for text and images. The omnidirectional SuperCache reuses previously transmitted data when scrolling and panning, further reducing how much data needs to be sent.

QoS (Quality of Service) is invaluable in prioritizing real-time traffic ahead of bulk file transfers and print jobs. The various virtual channels that make up the ICA protocol can be split across multiple data streams, and packet tagging (DSCP and WMM) also supports QoS.

Data Throughput and the Limitations of TCP and UDP

Finally, let’s consider data throughput, the topic that inspired me to write this blog post and the problem addressed by our new enlightened data transport layer. Data throughput is a major factor in user experience. The faster we can get the encoded data over to the user’s device, the better. And this is highly dependent on the intricacies of the transport protocol.

TCP (Transport Control Protocol) has been widely used for many years because of its robustness and reliability — most ICA virtual channels run over TCP today. This includes Thinwire, our primary display remoting virtual channel, known for being light on bandwidth and on server CPU and RAM.

UDP is generally used for real-time communications where getting the packets across the network quickly is the top priority and it doesn’t matter much if some packets are lost along the way. The HDX RealTime Optimization Pack for Skype for Business uses UDP and RTP for audio-video transmission, with Forward Error Correction to ensure good quality even in the presence of considerable packet loss. Likewise, other real-time communications solutions for XenApp and XenDesktop use UDP, such as HDX Audio, the Cisco VXME optimization pack for Jabber, and the Avaya VDI Communicator for One-X Agent and One-X Communicator.

Each of these transport technologies, TCP and UDP, has pluses and minuses.

UDP is, by definition, unreliable. This makes it unsuitable for many use cases. And UDP suffers from fairness issues.

TCP ensures reliable data transfer and it is a fair protocol; each data streams gets its fair share of the available bandwidth. But TCP suffers from latency bias, affecting its ability to saturate the available bandwidth in networks with a high bandwidth delay product. When network latency is high, TCP is slow to grow its “window” (the maximum amount of data the sender can transmit without a TCP acknowledgement; in other words, the maximum number of “bytes in flight” – bytes that have been sent, are traversing the network, but remain unacknowledged). This results in a lag in responsiveness. Also, TCP panics and shrinks its window substantially each time a loss of data occurs. This causes TCP to be especially inefficient when latency is combined with packet loss, even when the amount of loss is small.

NetScaler SD-WAN WAN Opt (formerly CloudBridge) offers WAN optimization technology that can greatly help to overcome TCP’s shortcomings on high latency or low bandwidth connections. It provides TCP flow control to improve responsiveness and tokenized compression, de-duplication and caching in order to reduce how much data needs to be sent while making optimal use of available bandwidth.

But what about users working from locations where there is no WAN accelerator? Or workers on wireless connections where stochastic packet loss can intermittently cause excessive lag?

Framehawk – Ideal for Workers on Wi-Fi and 4G/LTE

The Framehawk virtual channel introduced in 2015 provides a brilliant solution for mobile workers. Framehawk overcomes the limitations of TCP and UDP with its own proprietary data transport layer. This transport layer is built on top of UDP but is able to provide reliable transmission when desired. And it provides much faster data throughput at high latency than TCP, without compromising throughput on low latency LAN connections. It maintains constant awareness of network conditions and adjusts dynamically.

On top of this data transport layer, Framehawk adds technologies such as a QoS signal amplifier, time-based heat map, human factors based encoder and client-side “intent engine” to deliver an amazingly responsive experience for mobile workers on broadband wireless connections even in the presence of very high intermittent packet loss and high latency, such as often result from multipath signal propagation and spectral interference. This extra processing uses considerably more server CPU, memory and network bandwidth than Thinwire but provides an unparalleled quality of experience for highly mobile workers who need to be productive even when the network connection is poor. And now, with the new 7.12 release of XenApp and XenDesktop, Framehawk has been further enhanced to support NetScaler Gateway High Availability (HA).

Somewhat unexpectedly, many customers began to use Framehawk with HDX 3D Pro, our technology stack for professional graphics apps, even before it was officially supported for this use case. And their primary target wasn’t the mobile worker, although that use case proved to be valuable, too. These customers often have a compelling business need to keep their data in a single data center. This may be to protect their intellectual property as they employ contractors around the world. And it is often also because transferring their enormous 3D models over a network connection can take an incredibly long time, during which the design engineer is unproductive. Customers in industries such as Automotive, Aerospace, and Architecture, Engineering & Construction (AEC) implemented Framehawk and sent us glowing comments like these:

At a whopping 500ms of roundtrip latency between Detroit and India, “while the difference between LAN and WAN was perceptible, it was barely so.”

A global CSP remarked that with Framehawk they can now deliver 3D apps to Oil & Gas industry users “in the middle of nowhere” (over 250 ms RTL, high jitter).

Very impressive results… Our overseas [architects] are happy. In fact, it is the only way they can be productive. Hands down, Framehawk is a game changer.

For these customers, increased worker productivity easily justifies Framehawk’s higher consumption of server CPU and RAM compared to Thinwire. And while Framehawk was designed for broadband wireless connections, it also works well on the broadband WAN connections that these customers generally have in place between their various sites, since the incremental bandwidth consumption per user is much lower than the base bandwidth required.

But many of these customers longed for dual monitor or 4K resolution support, which simply isn’t practical with Framehawk, which was designed for laptops and tablets.

And what about all of our other customers who need high server scalability and low bandwidth consumption to save costs, but who also have workers on high latency connections?

This got us thinking. Could the intelligent data transport techniques pioneered with Framehawk benefit Thinwire? What if we made an enlightened data transport layer available to all ICA virtual channels? Would it also speed up printing? File transfers? USB redirection?

New: Adaptive Transport and EDT

Now, with 7.12, you can evaluate the Preview of a network-aware HDX data transport engine which benefits Thinwire and the other ICA virtual channels, too. This proprietary transport layer provides highly efficient congestion and flow control, achieving significantly faster data throughput than TCP, and it is fair, reliable and consistent. You can immediately evaluate this next-generation HDX technology on your corporate WAN (or with a WAN emulator) to see how it handles high network latencies, even with moderate packet loss. And soon you will also be able to evaluate it with NetScaler Gateway to see how it will benefit off-net workers such as work-from-home employees.

Adaptive Transport is a powerful new engine for HDX. In fact, you might compare it to a hybrid engine in a sporty automobile. Like a high-torque electric motor, it delivers incredible acceleration when it can utilize the full EDT stack (our proprietary transport layer). And it seamlessly switches to TCP when that’s all the network can provide, much as a hybrid car smoothly switches to gasoline power when necessary (no need to pull over to the side of the road).

Adaptive Transport

You can use EDT with standard, modern Thinwire (often affectionately called “Thinwire Plus” to distinguish it from legacy Thinwire) and with Thinwire-H.264, including with NVIDIA’s NVENC hardware encoding technology. And if you’re still running a legacy version of Windows, you can use EDT with Legacy Thinwire, too.

To learn more about this new technology and how you can immediately evaluate our Preview release, please review the updated ICA policy settings documentation. And we encourage our partners to attend Citrix Summit next month for training on this next generation technology.

We’ll look for your feedback on our Discussions Forum. There are various ways you can help the Citrix community with your observations. Think of the many different network scenarios where XenApp and XenDesktop customers will want to know which technology provides the best fit, for example:

Different latencies and levels of packet loss
Different virtual channels
EDT performance versus TCP with WAN optimization
Thinwire-over-EDT versus Framehawk, at different latencies and levels of packet loss

And, of course, we’re keen to quickly gather feedback on the Preview code so that we can harden the solution for production deployment.

My expectation is that Thinwire will not only remain the primary ICA virtual channel for display remoting but will become even more pervasive thanks to improved performance when running over EDT, while Framehawk remains the “big guns” solution for mobile workers on broadband wireless connections that suffer from high packet loss as the user roams. On many long haul network connections where packet loss is low to moderate, Thinwire-over-EDT be will a great option thanks to low resource consumption and multi-monitor support.

So, how many data centers will you need to serve a global user population? Well, the answer is, as always, “it depends.” But with HDX and Adaptive Transport, you won’t need as many as you would with a plain UDP-based or TCP-based remoting protocol. And whether you’re considering data center consolidation or a move to the Cloud, Citrix’s industry-leading HDX technologies will help your organization cost-effectively optimize user experience and maximize worker productivity.

Derek Thorslund
Director of Product Management, HDX

Topics

Products