Because this is the third article in this farm/site/zone design series, and since it appears I get around to providing an update on this topic every 2 years, I guess this is now a biennial tradition.
In case you missed the first couple, let me provide an executive summary for you:
- XenApp Farm and Zone Design (v2013). This was written almost 5 years ago and was based primarily on the IMA platform. I covered the various options we have to architect global XenApp farms and I tried to plant a seed that multiple farms are definitely OK, if not my preferred approach when you have mission-critical workloads and uptime or availability is priority #1.
- XenApp/XenDesktop Site Design (v2015). This was written after 7.6.1 was released so it was based on the FMA platform. I basically said we don’t have zones yet, the new LHC was still being developed so our only real answer was multiple sites. There actually wasn’t much up for discussion at the time.
Fast-forward to late 2017 and we’ve now released somewhere in the neighborhood of SEVENTEEN FMA-based XenApp/XenDesktop releases if you can believe it. And the FMA platform has come a LONG way since we released XenDesktop 5.0, which was actually the first release leveraging FMA instead of IMA (not sure if you remember, but XenDesktop versions 2-4 actually leveraged IMA!).
A lot has happened since then and we just shipped XenApp & XenDesktop 7.15 LTSR which I think is going to be a monster release. We also added some very important features over the last year or two in particular that are very key to this architecture design discussion. So, I’m going to attempt to clarify our stance on some of these items and answer a few of the more common questions we’re getting these days, for example, now that zones are back, does Citrix Consulting recommend them? Is the new LHC ready for prime-time? When should you do multiple sites or a single site with multiple zones?
The first thing I want to clarify is zones are back and better than ever. Yes, when zones made their initial debut in the 7.7 era, they weren’t quite usable in the context of global design, so a lot of people weren’t very hot on them (myself included). But I’m here to clarify that we did some incredible work over the last few Current Releases and zones now scale just like the IMA zones you know and love. Beginning with 7.11 we can now tolerate those high latencies such as 250 ms if we need to. The other thing to point out is we also now have back Zone Preference. Remember that ZPF feature from the 6.5 days? Well, we slipped it into 7.11 and it also makes using FMA zones more of a reality. So, the moral of the story here is we’re certainly OK with using zones again.
FMA is a database-driven architecture so the SQL database is very, very important in case you haven’t heard. But we all know that databases fail and technologies like AOAG can be expensive, especially in a pod architecture model, so that’s why we knew we had to re-develop some sort of Local Host Cache capability like we had with IMA. So, even if the database is unavailable, we’re still operational. This “adventure” started with Connection Leasing, which we actually just recently announced deprecation for. We’re moving away from Connection Leasing because the new LHC replaces it and is far superior technology. The new LHC or LHC 2.0 debuted about a year ago. And similar to zones 2.0, there were some skeptics after people dug into the new form of LHC in FMA. And rightfully so — it had a couple bugs out of the gates and it was limited to 5k VDAs, so it didn’t quite satisfy our larger customers. But over the last couple Current Releases, we’ve continued to make it better and better. The bugs are now gone and we’ve doubled the scalability of the LHC to 10k VDAs with the 7.14 release. So, the moral of the story here is we’re recommending the new LHC now and you need to move away from Connection Leasing ASAP.
One Site or Multiple Sites?
So, here’s the million dollar question. You have data centers and users all over the world and you’ve got XenApp 7.15 at your disposal; should you implement a single global site with multiple zones or multiple sites? Are there any scalability limitations or when should sites be capped? The answer is “it depends” of course. 😉 But I do want to make a couple points and attempt to describe what it depends on – usually this is not a technical decision, it’s a business decision. No, there is no magic number like 20k concurrent users for when to split up a site. That actually has more to do with how fast people log in versus how many users you might see in the steady state. And we have no scalability limitations on sites, so the sky is the limit there, and we have no latency restrictions anymore either, so we’re good to span sites across oceans if we want.
But just because you can go with a single site, doesn’t mean you should! If you interview key stakeholders or “the business” and they tell you that uptime or availability is more important than management or administration, then I’d argue you should adopt a pod architecture and go with multiple sites all day long. If the IT folks tells you they have crazy logon storms or shift-worker scenarios, you might also opt for multiple sites. If they have mission-critical workloads and flexibility when doing upgrades is key, then that should also point you in the multiple site direction. But let’s say “the business” and IT folks tell you that they don’t have mission-critical workloads, they would rather simplify administration and their PoSh skills are weak, then I might recommend a single global site in that situation. So, it does depend but these are some of the variables or factors it depends on.
Examples always help — I know one healthcare customer that has a total of 12k CCUs, and because they can’t afford patient care to go down, we’ve got them running 4 pods or sites with ~3k CCUs each. Any outage only takes out 25% of the user population since we’ve reduced the size of the failure domain leveraging this multiple site approach.
Another customer has 23k CCUs, and since their focus was ease of management, they’ve opted for a single site. Just different business priorities and risk tolerance for this particular customer and there isn’t anything preventing us from scaling even higher with one site.
The last thing I want to point out is that sites and zones are not mutually exclusive. You can use them in combination with each other and that’s actually what we’re recommending more and more these days. Let’s say we have a customer that has 6 data centers located in Los Angeles, Sydney, Tokyo, Amsterdam, London, and Rome. I’d probably recommend 3 sites in this scenario and use zones to “group” the data centers within each major continent or region. So, one site for North America with Los Angeles, a second site for the APAC/APJ region with 2 zones for Sydney and Tokyo. And a third site for Europe with 3 zones for Amsterdam, London and Rome. So, it ends up being a sort of hub-and-spoke architecture leveraging multiple sites and zones.
So, that’s your 2017 update: zones, ZPF and the LHC are back and better than ever. We now have the tools at our disposal to implement global farms or “sites” again. But just because you can, doesn’t mean you should! It’s best to evaluate your organization’s business priorities and design your Citrix environment accordingly. And usually that ends up being multiple sites (with maybe some zones sprinkled in) to ensure maximum uptime and availability, which seems to be the #1 priority for customers these days.
Nick Rintalan, Principal Architect, Citrix Consulting Services (CCS)