Lossless Compression: Lowering the Cost of Pixel Perfection

I thought I’d take some time to talk about image compression, an important part of Citrix technology and a topic that I’m particularly passionate about.

In XenApp and XenDesktop, 99% of the images that end up on a user’s screen have been compressed (and eventually decompressed) by one means or another.

Take, for example, our most recent addition to Thinwire, Selective H.264. Here, we are able to identify regions of the screen that are rapidly changing, for example, a YouTube video, and encode this region using H.264. Being a contextual based codec, H.264 is brilliant at delivering high quality video when bandwidth is limited. So, we use it for exactly that, and the rest of the screen can be dealt with by Thinwire’s traditional encoding techniques. The following example shows this quite nicely:

The green rectangle shows what Thinwire has detected a region for Selective H.264 encoding (Adaptive Display v2). The use of H.264 is negotiated with the end-point Receiver, and if it is not supported, variable quality JPEGs will be used instead.

Next, Thinwire efficiently decomposes the rest of the screen and uses JPEG (lossy, red) for photographic or complex imagery and RLE (lossless, blue) for text and simple imagery. In the latter case (that can often form a large part of what the user actually sees for day-to-day work), bitmaps will be compressed with the Citrix lossless codec, known as 2DRLE. As the name suggests, this is an implementation of the well-known and well-understood compression scheme “Run Length Encoding”.

Run Length Encoding (RLE)

The general idea behind RLE, when applied to two-dimensional image data (a.k.a. “a bitmap”), is to express runs (repetitions) of the same pixel in a more efficient manner. For example, consider the following sequence of pixels:

ABABABCCCCCCCBBBBBABABAB

Assuming 8-bits (1 byte) per pixel, 24 bytes are needed to express this sequence.

Now, notice how the pixel “C” repeats 6 times followed by a 5-long string of “B”s. The same string of pixels can be encoded as:

ABABAB<7>C<5>BABABAB

Assuming the counts can also be stored as a byte, this version only needs 16 bytes of memory. That’s a saving of 8 bytes, or 33%! OK, we’re only talking about a measly 8 bytes in this case, but with ultra-high resolutions becoming more common, the savings RLE offers can easily run into the megabytes.

Here, I’ve described a very simple implementation of RLE. There are various techniques which further improves the compression ratios achieved by the basic implementation. Indeed, our 2DRLE encoder has undergone years of optimisations and improvements to make it a very efficient lossless encoder – certainly more efficient than anything else I’ve come across, and believe me, I’ve looked long and hard! To give you an idea of how 2DRLE performs, I ran a quick benchmark on a sample set of 8,947 bitmaps (ranging from 4 x 4 pixels all the way up to 1280 x 1024) captured from a live desktop session that were destined for lossless compression, and compared the results against another popular lossless image file format, PNG:

Lossless Compression	Total uncompressed size (bytes)	Total compressed size (bytes)	Time taken (milliseconds)
2DRLE	360,580,256	14,088,798	7,700
PNG*	360,580,256	16,357,146	20,200

(*libpng was used to compress sample data)

As well as achieving a better compression ratio than PNG, 2DRLE is over 2.5x quicker. That last point is especially important: compression speed is absolutely critical for session interactivity and performance. PNG will probably compress better on more complex photographic data, however, 2DRLE expects to see very little of this in reality.

It’s been a personal challenge/pet-project of mine to beat the highly-evolved 2DRLE, and over the past year I’ve experimented with various ideas. Some have yielded very good compression ratios, but always at the expense of CPU or memory. Others not so good, but that’s part of the challenge: sometimes you win, sometimes you lose.

And recently, I won. Here’s a sneak peek at my latest results:

Lossless Compression	Total uncompressed size (bytes)	Total compressed size (bytes)	Time taken (milliseconds)
2DRLE	360,580,256	14,088,798	7,700
PNG	360,580,256	16,357,146	20,200
MD_COMPRESS	360,580,256	12,398,102	7,750

So far, I’ve shown a 12% improvement over 2DRLE with next to no increase in CPU usage! I’ve also experimented with other publicly available lossless image formats like FLIF or Google’s WebP. FLIF generally achieved a better compression ratio than MD_COMPRESS but – and it’s a big but – it was 12x slower! Similarly, WebP did marginally better but again was far too slow for real-time use. My point here – and it’s an important one – is that, in addition to end results, we pay close attention to all the resources required during compression (server) and decompression (client). This allows us to make clear, data-driven decisions, and ensures HDX retains its top-spot as the #1 remote access protocol.

What does this mean in the bigger picture? A 12% reduction in bandwidth means fewer bytes sent over the wire, and if you’re paying for your bytes—for example, a cellular data plan—well, that’s a cost saving right there. Or perhaps an extra user or two on the same bit of pipe. Either way, it’s a nice to have with little to no extra resource cost.

At Citrix, we’re always looking for ways to give our users the best experience possible. And if that means finding innovative new ways of doing things rather than re-using existing tech, sitting back and saying “that’s good enough”, then it makes my job that bit more rewarding 🙂 My philosophy is that it’s never good enough.

Topics

Products