In the 6.0 release of Application Streaming, we moved away from storing Application Streaming content in CAB files, changing to an unzipped DIR structure. As noted in this earlier post, this was done to simplify “deploy” in offline scenarios and also to enable movement of the streaming targets into VHD files for layered mounting on pooled XenDesktop. This mounting stuff comes in the “soon” release. Going to DIR format was a first step in teaching the streaming system to mount VHDs.
In concept, cab file usage in App Streaming is now “ancient history”. In reality, many of you are still on version 5.2 and some of you have indicated a reluctance to move away. You like CAB files!
Today’s post digs deeper into CAB files
I encountered some highly skilled technical folks recently who were mashing DIR format back into CAB. This is very “on the edge”. Hacking is fun, but hear this: this must be done VERY CAREFULLY, if at all.
Not all cab files are created equal
The Application Streaming profiler (5.2 and before) is very careful about creating CAB files. While they look like, smell like and behave like very every other cab file on the planet, the “stuff” inside the cab file is stored in a “more optmized” form. Yes, the cab file can be read and processed by any CAB file capable program, but dig deep and you’ll see that the CAB files produced by the streaming profiler are … better.
In the beginning of Application Streaming back in the Citrix Presentation Server 4.0 time, we had some lengthy conversations about data formats for application streaming PROFILES (then called “packages”) and TARGETS (the executable layer).
One thing we were SURE of is that we wanted a single file to hold everything in the execution target.
Considering offline as a major usecase, we wanted an atomic test of whether a target was completely copied to the deploy space. If the single file is there, then it is copied”, else “not copied”. No need to worry about partial data transfers. Same true for saving a profile to the Application Hub during profiling. It is assumed that the profile is in active use when profiles are saved and saving a single file makes for an all or nothing operation.
Question was, WHICH container format?
CAB, ZIP, TAR, RAR, DIR
We experimented with a number of potential storage formats and even coded 3 of them. DIR, CAB and ZIP. This was a couple years before App Streaming version 1.0 released in Presentation Server 4.5. Debate continued well past that first release with each member of the team having their own personal favorite and I’ll add that this debate continues, many years after the original decision.
For the record, my desired format was RAR – it was never really a contender.
There were a number of advantages and disadvantages of each. Focusing on CAB vs. ZIP, here are a few:
- ZIP files can be updated in place, adding, updating and deleting files without recreate whole ZIP.
- CAB files by contrast must be completely created any time they are modified. Header in front.
- CAB files have Microsoft CABINET.DLL that ship with EVERY version of Windows.
- ZIP files need a library of code from some good source to access the files. Workable.
- ZIP files are limited in size to 2GB
- CAB files can “span” multiple CABs to eliminate 2GB limit
- CAB files can be read by any utility program that can view CAB files, such as Windows explorer.
- ZIP files at least back them were not universally viewable. That’s gone today, but it was a thing.
It can be argued that CAB is both awful and excellent at storage of stuff inside the CAB. Bottom line, we went with CAB. It seemed like a bad decision at the time, but it turned out to be quite good.
Efficiency of file extract
CAB files are great for storage of a whole bunch of files, where one wants to extract ALL FILES when extracting any file, but CAB files in their normal format, really suck if you want to randomly extract a single file from the CAB.
This is EXACTLY the form that Application Streaming uses when streaming in a single file, so you’d think this is a road block as each file “streamed” will be exactly this “random” file extract.
True in concept, in reality, CAB files can handle this just fine and, with the test of time, we now know that not only is it not slow, we now know that extracting files for App Streaming from CAB files is FASTER than copying them across the network.
It wasn’t enough of a delta to have us move away from DIR in 6.0, but it is still true!
Yes, we knew this ahead of time.
Now we are on to the subject of this post
Customer doesn’t like DIR format of App Streaming introduced in version 6.0 because DIR format doesn’t stream from a web server App Hub as well as CAB (their words, not mine). Okay, my words too. Customer complains that DIR format means that MIME TYPES have to be (*) and they also note that performance against CAB is better than same in DIR.
Customer hypothesis: App Streaming 6.0 client can still run packages from version 5.2 and that means it still knows how to handle CAB. If I feed it a CAB, it will have no choice but to open it and then my web activity will all be better. There are a million of assumptions in this, but the concept is actually good. Notice I didn’t say it will work. It’s real good for hacking though.
Problem: They produced CAB file using Microsoft CabArc. While this is a fine tool for doing what it does, it lacks the elegant formatting that the streaming profiler uses and this is a … big problem.
Think in terms of streaming 100 times slower than in the before case. 1000 times?
CAB files and CabArc were written with the idea of installation programs. Place a whole bunch of files into a single container and then extract all of them in one go. For efficiency, the files are COMPRESSED when they are in the CAB file.
The million dollar problem. The compression is across ALL FILES in the CAB.
The compression Window in the above graphic is represented by the BLUE. Notice it is shared across all files in the CAB. This means that to extract a file from the CAB, the CAB library has to extract all earlier content.
For example, to extract “file 1”, no problem. Find “File 1” in the header table of contents, seek to the start of the compressed data, read file 1. It’s right there at the front! DONE. If you then move on to file 2, again no problem, just keep de-compressing and placing files in the output space. This is how CabArc “extract many” works.
For application streaming though, we want a SINGLE file at a time.
If we’re after “File 17” in the above list, the file is stored COMPRESSED and that storage of file 17 is dependent on the compression history of “File 1” through “File 16”. To get File 17, the cab library has no choice but to first read files 1 thru 16, so that when it extracts 17, it has the compression history to know what it has.
In CAB parlance, the compression history is called a “Folder”. This has nothing to do with a “Directory”. Each file is stored in the CAB file with the PATH included. There’s really no such concept of a directory, just a deep path to where the file should be placed when it is extracted.
In CAB, a “folder” is a set of files in a CAB file that all share a single compression history. For CabArc, all files in the CAB file are stored in a single “folder”. Bingo! Big problem! Compare this to Citrix Application Streaming produced CAB files and things get more clear. And – colorful!
CAB file storage under Citrix Application Streaming.
In the above, colors again represent compression windows (“Folders”). With Citrix Application Streaming Profiler, when the CAB file is created, the “folder size” is 1 file. Each time a file is added to the CAB file, the streaming profiler tells the CAB library to reset the compression history.
This results in a CAB file that is slightly larger than a same CAB file created with CabArc, but the random extraction of files is hundreds of times more efficient.
To get “File 17”, the streaming client opens the CAB file. The header is read by the CAB library. When “File 17” is wanted the CAB system seeks to the desired position in the CAB file where the compression folder starts for File 17 (note: 1 file per “folder”). It then reads the number of compressed bytes needed to re-produce the original “File 17” and it’s done.
As a side benefit, the CAB file is “opened once” and then left open. Later file gets occur without having to re-parse the header, you end up with SEEK, READ, De-Compress, done. This is fast, but it is all dependent on careful creation of the CAB file at the start.
Fairness to CabArc
CabArc DOES include the ability to reset “Folder”. Quoted from the CabArc link:
When creating a CAB file, the plus sign (+) may be used as a filename to force a folder boundary. For example,
cabarc n test.cab *.c test.h + *.bmp.
While that looks promising, its not enough. If do this, then cannot use wildcards to create the CAB one file per folder and you also cannot recurse into subdirectories, so it’s not really going to help.
Tools to continue hacking
Experiments are good and we can learn much. While I would actively guide against doing things like DIR -> CAB in production, it’s good stuff for experimentation. Where can one find a tool that will produce CAB files with all the excellence of the Citrix Streaming Profiler version 5.2?
My internet search came up empty.
I know the streaming profiler can do it, but in 6.0, it produces DIR with no ability to produce CAB. Version 5.2 will produce CAB, but it won’t have the other good things in 6.0.
I can imagine that a tool could be written to do this, but as of this moment, none exists. So, this is all a good theoretical discussion to have App Streaming 6.0 use CAB format, but bottom line is that this lack of “good” CAB files is a significant block. CAB files must be created properly for efficient performance and since they can’t be, going this route is problematic.
Please share your thoughts…
Citrix Systems Personlization Architect
Application Virtualization and Profile Management