I’m debugging a good one lately, Discretionary Access Control List (DACL) programming with Citrix App Streaming. Recall from a previous post that Citrix App Streaming dynamically adjusts user access rights for the execution cache as a function of running applications in sandbox and as a function of terminating application isolation spaces. This is very cool stuff – but it is presently creating some headaches. This post describes the headaches.
The quick review of the earlier post is that users on a XenApp Server are allowed to see the application installation content ONLY for the applications that they are presently running. Applications that they are not running, they cannot see, even if those applications are presently running on that machine, right now, supporting other users.
From the layers of glass, I’m talking about the middle layer here.
The middle layer is SHARED across all of the isolation spaces. If multiple users are on the machine, only one copy of the installation image exists for all. This space is stored below \Program files\Citrix\RadeCache. It SHOULD be below the streaming client subdirectory, but it isn’t – I digress.
At installation, the streaming client installer sets permissions on the RadeCache space so that the Streaming Service user account (Ctx_StreamingSvc) has “full rights” and “users” have no rights, not even file scan. At runtime, the streaming client grants the specific user Read+Execute rights to the GUID_V subdirectory that holds the execution content for this specific application. This means that if a XenApp server has 100 users on it, but right now, you are the single user running “Toontown”, then ONLY you will be able to see the Toontown bits.
It’s even more restrictive than not being able to see the bits, you can’t even see what folders exist below the RadeCache because you never have file scan for that space. Interestingly, you CAN “CD” into the subdirectory if you know the GUID_V, though this works only while the application is “running”.
Compare this to locally installed apps. On a XenApp Server, a USER can “file open” and browse to \program files and then view all of the applications that are installed on that server. For App Streaming delivered apps though, things are more locked down, the user cannot “see” application bits for applications that they themselves are not running.
This is very elegant. Elegant can be problematic. The user rights are granted using the Windows API SetNamedSecurityInfo. This is the Windows API version of ICACLS.exe. The headache starts in how this API implements ACL assignment.
Around 4 years into this App Streaming thing and all this DACL stuff has existed since the beginning – 4 years clean, I have seen it break down twice in the last 6 months.
Problem #1: Customer has some really massive servers and when the individual user count hit 900, the DACL set failed and the Application Launch failed. Hum. Why failed? DACL size limited to roughly 64KB and the SIDs of all the authorized users started to add up. Customer didn’t have 900 users on the server, they had many hundreds, but also had a few “blowed up” sessions where the service didn’t have a chance to remove rights so the DACLs built up – over months until hit a full DACL space. The streaming client didn’t notice the failure on the DACL set, but it did fail to CreateProcess the application, launch failed. Solution: Purge the DACLs on regular basis. This isn’t today’s conversation.
Problem #2: Citrix internal IT folks implementing layers of cake; observing “slow” streamed application launch for the first App Launch, while 2nd time launch is fast as they would expect. WHY is first slow? It shouldn’t be slow, the RadeCache is “mounted” into the space making all apps close to a 2nd time launch. Sure the registry would need population, but that shouldn’t be too bad. Good news on that one by the way, on 5.2, the registry needs population. On not yet released Mako, the registry load is a hive MOUNT which is much more efficient.
The suspect configuration has Citrix Provisioning Server, XenServer, XenDesktop, App Streaming and Microsoft Office 2007. Variants of this profile have been used for stream to physical machines for 1000 users for a year or so now, including the stuff I’m using to write this post. We know the profile is “good”.
First time app launch slow
Launch debugger: Observe source code and binaries as it steps through, see a few neat things.
1) Streaming client is doing a DIR /S on the RadeCache for each sandbox create. Wow! That sucks and shouldn’t be there, but it doesn’t explain things being slow. For those wanting to know more, it is this code that calculates the cache utilization and decides when to dispatch the cache reaper. Ignore this – more digression.
2) Streaming client is setting DACL to grant user access to the cache. Wholly crap, how long did that take!
In WinDBG, you hit “F10”, “F10”, “F10” to step over code. It usually takes about as long to get to the next line of code as it does to release the F10 key. In the case of the API to set the DACL, the machine “froze”. I mean, NOTHING – for many seconds! Wassup?
I left the room all happy with myself telling the IT guys that something is completely hosed in your enterprise disk stuff because that should be “instantaneous”. The DACL addition is a SINGLE DACL addition for a SINGLE user, for a SINGLE directory, there’s no way that should have taken SECONDS.
Recall that the RadeCache space was already populated, the streaming system was merely granting a user access; but easier than that, it was only setting a single DACL on a GUID_V sub-directory, which would propagate into the Device\C\Program files\… spaces of the execution image. SECONDS! Your joking.
Being smart people, the IT folks didn’t let it go.
One of my favorite books
One of my favorite books is “The Zen of Code Optimization“, by Michael Abrash. Yes, that’s “Zen” with a ‘Z’. I haven’t had the chance to meet Michael, but when I do, well – Beer will be on me, this dude knows his stuff. He is behind a bunch of neat things like the graphics libraries for Doom and generally, he knows how to make a computer do things efficiently.
One of the best pieces of the book is “Chapter 3 – Assume Nothing”. Yes, this is a whole chapter. The gist of this is that just because you THINK it will be fast doesn’t mean that it is fast, you must MEASURE IT. The corollary is that just because you have 4 years of a product in the field saying that it is fast, doesn’t mean that it really is. In this case, I’ve paid the penalty for “The Costs of Ignorance”. This outlined in excruciating detail on page 27.
DACLs and Inheritance
The ICACLS command and the streaming code of reference have the same behavior. It comes down to this:
- icacls GUID_V /grant domain\username:(CI)(RX)
In THEORY, this sets a SINGLE inherited right at the top of the execution cache to permit the named user the ability to read files in that space, directory scan and execute content. This is SUPPOSED to be what happens. In theory, any CreateFile access to a file/directory below that space will then be influenced by the inherited rights from the higher directory, where we set the DACL. This is what inheritance means – the higher directories have it – and this means that the subdirs do too. Start at the root and work your way down, and permissions follow! SIMPLE!. Four years of hindsight now says that this isn’t actually what happens!
I’m amazed – chapter 3 coming to bear.
What really happens
First – I checked with a bunch of certified smart people and none could find a hole in the programming of the DACL set. The code is “perfect” as coded – but it’s slow…
Instead of doing what you tell it to do, the Windows API gets “elegant”. I didn’t ask it to inspect the subfolders and files, but it does! That is stuff that should exist in the Windows Explorer shell and maybe in icacls.exe, but it dag gone tootin shouldn’t happen automatically in the API version! I already KNOW that the DAClS for the sub-directories and files are good, don’t help me out by fixing stuff that ain’t broke!!!
Instead of adjusting the rights of a single subdirectory, the system RECURSES to all files and subdirectories.
In the ITDev case, the subdirectories of the indicated space contain the mounted, FULL installation image of MS Offfice 2007, over 7000 files!
On a local machine, as in true physical hardware with true physical disks, you hardly notice; it’s “short” time. On a Virtual Machine, … it matters! I mean, it doesn’t matter for one file, but if you multiply by 7000, it adds up!
Recall that the execution machine is Provisioning Server booted, XenServer execution, some fancy disk system, everything is virtual to the end. The files that “exist” as local in the RadeCache don’t really exist, they are merely described to the virtual machine as local, indeed, there’s no such thing as “local”, ever for virtual machines.
For each of these files, the Windows API performed it’s stuff, and recursed into all the subdirectories. It read the DACLs for each file/directory, compared them to the DACLs for the directory that we were setting up via the commanded API, determined that “all was good” and then made no adjustment. BUT – merely reading DACL information from the files, caused disk blocks to get “paged in” and CPU wise, this is painfully “slow”, even for virtual CPUs!
For small apps, nobody will ever notice. For a monster like 2GB MS Office 2007 at 7000 files, they notice!
Recall – this only happens on sandbox create and destroy, so it effects FIRST time app launch and not second time app launch.
Step 1: Prevent the streaming service from adjusting rights to the GUID_V directories. I did this with a code change, but the same thing can be simulated by removing a “right” from the streaming service user on install. Not for the feint of heart. To know, the service will “push on” should the DACL adjustment “fail”. You can read that as “it doesn’t check the return code”.
Step 2: Grant “everyone” R+X access, (CI) for the whole RadeCache directory. This allows users to see and execute the contents of the RadeCache – eliminating the security advantage of hiding this space, but it also allows users to RUN THINGSs if the DACLs aren’t adjusted.
Getting the layers of cake going on XenDesktop takes work. It also takes experiment and “doing it” yourself to experience some of the problems that arrise at scale. We’re doing that work and doing it with some pretty massive applications.
For small apps, you’ll never notice that this delay exists, but for large apps, it’s a noticeable delay. We’re attacking it. Look for changes in this area in a Parra/Mako + 1 streaming client. Notice I didn’t say Mako.
The good news near term is that it only comes up on a FIRST time app launch. Second app launches are “fast”, like on real hardware. The bad news is that on pooled desktops, the first app launch of each logon is a first time app launch. Populating the RadeCache makes most of that time disappear, but parts still exist. Sandbox reuse helps, but isn’t the only part of this puzzle.
The other thing to observe is that some of the delay in first time app launch is loading the installation image registry content. This is a registry load operation on 5.2, which was the test environment. In Mako/Parra, if you load/save the profile targets, this will be a registry MOUNT – which is much more efficient – even more importantly, it doesn’t involve registry “writes”, which is a winner.
My compliments to the Citrix IT folks for not letting go.
Fun for your experiments
Create a directory with no files. Time this command
- icacls pathtodir /grant localmachinename\username:(CI)(RX)
It will be fast.
Add 7000 files to the directory by copying \program files. Repeat the experiment. Amazing results…
Offline, I have received some compliments for sharing some of the “faults” in the streaming system. I hope this post contributes positively to your XenDesktop implementations and to your understanding of exactly how things are running. I find it helps to really know what’s going on.
Citrix Systems Product Architect – Application Streaming