Shortly after I wrote my StoreFront 2.6 Scalability article, we released StoreFront 3.0. And we recently completed our first internal round of hard-core performance & scalability testing, so I have some results and updates to share.

Let’s get to it, shall we?

SSS, General Sizing & Deployment Guidance

This hasn’t changed a ton from 2.6 to 3.0, although, generally speaking, we’ve made some great performance improvements across the board in 3.0 and we can now support about 10-20% more connections per StoreFront box. But I’d still recommend starting with 2 or 3 StoreFront nodes with 4 vCPUs and 8 GB RAM and that should get you to about 150-200k connections per hour (with a logon rate of 50 requests per second).  And before where we recommended “capping” the number of nodes in a server group at 5, we are now comfortable supporting up to 6 nodes in a single server group (I need another entire article to explain the “why” behind this, but just trust me for now).  But that VM spec still seems to be the sweet-spot and will get most customers where they need to be. So what has changed then?

Auto-Provisioned Apps with RfW

The scalability of auto-provisioned apps with RfW, where StoreFront is rolled out to new users, has been significantly improved in 3.0 compared to 2.6.  We made some core tweaks to StoreFront which reduced the number of roundtrips to the various Delivery Services which improved response times by 80% and overall system throughput by 140%!  To put this in perspective, if you have 5 auto-provisioned apps, you can now support somewhere in the neighborhood of 125k connections per hour in 3.0 (compared to 60k in 2.6.).  And if you have 100 auto-provisioned apps being rolled out to new users, we can now achieve roughly 15k connections per hour, whereas before with 2.6 we actually struggled to consistently log on users and enumerate resources and experienced failures from time to time.  So this is a big improvement worth noting and very important for those using RfW with auto-provisioned apps.

Garbage Collection

We realized in SF 2.6 that we had an issue with overall system throughput. As it turns out we were using the default Workstation Garbage Collection (GC). So one of the key changes we made in the 3.0 release was to implement Server GC, which is actually a recommended practice for ASP.NET applications on multi-core servers. This resulted in increased throughput anywhere from 5% to 28% depending on the targeted component tested.

Memory Consumption

If you remember from my last article, I said that RfW required a lot more memory for each user/resource versus Native Receiver. We are happy to report that we’ve worked hard to reduce the memory required for each user/resource from 3 KB in 2.6 to 650 Bytes in 3.0!  As a result, RfW scalability is a lot closer to Native Receiver scalability now (only ~15% difference now in 3.0).

Credential Wallet

This is something we caught post-2.6 release but before 3.0 went out the door thankfully. We found an issue with the Credential Wallet service under extremely high load. More specifically, we ran into a bottleneck due to the CW service when a single SF 2.6 server issued approximately ~120k auth tokens at any given time (basically you were limited to approximately 120k active user sessions). Luckily only 1 or 2 customers in the world ran into this issue. But we are happy to report that the issue with the CW service has been fixed in the 3.0 release and we have successfully tested up to 400k user authorization tokens.

X1

Now that Receiver X1 is out we wanted to look at how it impacts StoreFront scalability.  As expected, the “Day 1” impact on scalability is fairly substantial as we’re downloading ~120 total files.  Compared to the Web Receiver API testing, StoreFront throughput is reduced by almost 100% when logging on via X1 and fetching a Receiver website at a rate of 100 requests per second.  It is important to keep in mind that these results are only valid for the Day 1 scenario where every user downloads the entire Receiver website.  On subsequent days or logons, the site would be cached and scalability or throughput would not be impacted.  As is the case with using RfW, when using X1, environments should be designed to allow an extra 650 bytes per resource on top of the base 4 GB memory requirement for StoreFront 3.0.  This is one of the reasons I’m recommending 8 GB for every StoreFront VM out of the gates.  One other note – we enabled Integrated Caching on the NetScaler for this particular X1 test so we could provide caching of static content such as JS, CSS, JPG and GIF files.

Future Testing – Site Aggregation, PNAgent, IMA/XML

Of course, there is always more work to do.  We’ve started looking at some other advanced scenarios such as how multiple stores and site aggregation affects StoreFront scalability, how legacy PNAgent affects scalability and how all of these numbers might change if we’re enumerating 6.5-based IMA/XML farms versus FMA-based sites (which all this testing has been based on to date). Once we put these things through our performance lab and have some numbers, I’ll be sure to provide another update.

Special Thanks

Once again, a special thanks and shout-out to our System II Testing team in the UK led by Martin Rowan.  OlgaK also deserves a ton of credit for this StoreFront testing in particular.  I just interpret a lot of the test results and come up with sizing recommendations and leading practices, which is the easy part IMO. All the hard work and months of testing is performed by Martin’s team, and none of this would be possible without them.

Cheers, Nick
Nicholas Rintalan, 

Lead Architect & Director – Americas, Citrix Consulting Services (CCS)