A couple years ago, I wrote an article called “XenApp Farm and Zone Design,” which was based on the IMA architecture and was specific to XenApp, as the article’s name implies. This is, essentially, “Part 2” of that article, so if you haven’t had a chance to read the first one (or you can’t remember what I was arguing in that article), then please go check out Part 1 first.
Here, I am going to talk about FMA Site Design, which applies to both XenApp & XenDesktop 7.x, which use the FMA architecture. And really I’d like to shed some light on some interesting designs we’re doing in the field when there is more than one data center. Because you actually have options, despite what you’ve probably been told. 😉
The Absence of Zones in FMA
No, the current shipping version of XenApp & XenDesktop (7.6 FP1) does not have a “zone” feature similar to what we had in IMA. And a “site” in FMA is really analogous to a farm in the IMA world. So, when there are multiple data centers in the mix, we have to implement multiple FMA sites, right? That is certainly what our documentation says (buried towards the bottom of this page in eDocs we essentially tell you to implement 1 site per data center and leverage StoreFront aggregation). And if you call Support, that is probably what they’ll tell you as well (“you have to create a separate site for each data center to be officially supported”).
But what if you have 2 data centers that are connected via dark fiber? What if those 2 data centers are in the same city, but literally across town from each other? What if the latency between your data centers is sub-5 ms? What if it is 50 ms? 100 ms? Where does the VDA-DDC and DDC-SQL communication within FMA really break down and start to cause performance degradation? Those are the questions a few of our customers have been wondering as they make the transition from IMA to FMA, so our Consulting and Product teams decided to dig a little deeper and figure it out.
As it turns out, we have quite a few customers with “well-connected” data centers within close proximity of one another and we really can get away with a single site.
This is the tricky part: defining just what the heck “well-connected” means. Because if you’ve got 2 data centers that are well-connected to each other, you absolutely can get away with a single FMA site (and I would argue it is fully supported if it meets the requirements I’m about to lay out). Most vendors and industry experts seem to agree that well-connected means they are connected via high-speed link and that link has very low latency. But what does “high” and “very low” mean in this context? What does “close proximity” really mean? It will vary slightly depending on who you talk to, but most folks seem to agree on the following:
- High Speed Network Link = 1 Gbps+
- Very Low Network Latency = sub-5 ms
- Close Proximity = 50 miles or less
So, if you have 2 data centers connected via dark fiber and they are, say, 15 miles apart and the average network latency is 3 ms, then you can definitely treat those as one logical data center if you so choose (and implement 1 FMA site). We’ve done this a number of times already in the field and there are honestly no performance issues whatsoever. And again, I think this is a fully supported scenario and you can point to this article if someone tells you otherwise.
Where it becomes a little grayer is when you have two data centers that are connected via 10 Mbps and say 30 ms latency (or even 1 Mbps and 100 ms latency). What is the tipping point and when should you definitely implement multiple FMA sites?
First off, I have to say that if you don’t meet the requirements I outlined above, then you’re not going to be officially supported by Citrix. This may change in a future version of XenApp/XenDesktop, but with 7.6, if you decide to do what I’m about to tell you, then you’re taking a risk, as you’re relegated to “best effort” support.
Now that the disclaimer is out there … we have a few customers who have the classic branch office scenario or “mini” data centers (DCs). These branches or mini DCs have data that can’t be migrated to a central office/DC, but at the same time, it’s neither a ton of data nor users, so we don’t need a ton of workers/VDAs to support the load. These scenarios are perfect for extending a single FMA site to distributed sites that are not-so-well connected to the main DC (where SQL and the Brokers live).
So, what does “not-so-well-connected” mean and where does it fall over? After running this through the lab with a WAN Emulator testing, literally, dozens of different link speeds and latency combinations (and also proving this out in the field at a few willing customers!), we found that things start to deteriorate if you exceed 50 ms latency or have less than 256 kbps bandwidth. And while I mentioned bandwidth/speed there, it really didn’t play as big a factor as we thought and we were even getting decent results with ~100 kbps! Like most applications, it’s really all about the latency.
So, what did we test exactly and what should you expect if you do this? Well, that’s a bit out of scope for this article (and maybe I can do a follow-up article with all the gory details if folks are interested), but a few things I’ll highlight:
- We measured the time it took VDAs to register and receive the list of Brokers (it just about doubles the time to receive the list of DDCs at 50ms/256k)
- We measured resource (app/desktop) enumeration times (this had almost no impact at 50/256, but it would double or triple at say 100ms/100k)
- We measured launch/login times (this added about 10-12 seconds to our baseline of 15 seconds at 50/256)
One thing I want to point out is that these initial tests were focused on analyzing what happens when the VDAs were geographically distributed from the DDCs and SQL. We also did a second round of tests analyzing what happens when you introduce latency between the DDCs and SQL (imagine a secondary DC that is large enough to warrant “local” Brokers, but SQL is still centralized in your primary DC). The current shipping version of FMA actually works in that scenario as well, but things start to break down at different network latencies and speeds. I’ll save the findings from the remote SQL tests for another day since I don’t want to lose you.
One Site or Two?
So … should we always implement a single site if we have 2 data centers in close proximity with high bandwidth and low latency? How about if we’re under 50 ms latency? Just because you can, doesn’t mean you should. There is a reason I was arguing to implement multiple farms in Part 1 of this article. We had the option to do zones in IMA, but we opted not to many times, and for good reason.
There is a reason Dan Allen and I get up on stage seemingly every year at Synergy or BriForum and plead with customers to leverage a pod architecture. It really comes down to the scale of the environment, data center architecture, performance, cost and risk tolerance. Not every customer has 100k desktops or users.
Not every customer loses a million dollars a second when they are down. And not every customer prints money. So you really need to figure out the best approach based on your unique business requirements.
The other thing to keep in mind here is that not all secondary (or tertiary or N*) data centers are the same. I just told you it might make sense, in certain situations, to deploy local Brokers at larger satellite DCs. But it doesn’t make much sense for a very small branch office or mini DC. So, I do think extending a single FMA site to smaller data centers or branches (with data) makes sense if cost and simplicity is important to the business. But I don’t think it makes sense if you have a huge environment and your “other” data centers have tons of data or are just as big as your “primary” data center.
I just finished a 7.6 design for a large customer with 3 data centers that all have tons of data and are basically treated equally–and these DCs have 1 Gb connectivity and less than 30 ms latency to one another. So, we probably could have done a single site, but it really never crossed our minds due to their tolerance for risk and DC architecture. We implemented 3 sites and used NS/SF to aggregate resources and all is well.
I will say it again: just because you can doesn’t necessarily mean you should. Every environment is different and you have to use your head. I just wanted to point out in this article that FMA is much more resilient than folks might think; just because we don’t have “zones” today doesn’t mean we can’t have remote VDAs or even SQL for that matter. And we have proven this in the field already; this isn’t lab stuff only.
Other Design Questions – PVS, SQL and More…
I really just began to scratch the surface of XenApp & XenDesktop 7.x Design in this article. We covered FMA Site Design, which, in my opinion, is kind of the easy part. What about PVS? How tolerant is that in terms of latency? Can we get away with 1 PVS “farm” spanning 2 data centers as well? What about those remote SQL tests I mentioned above? When should we deploy local Brokers? Should we implement AlwaysOn or stick with mirroring or clustering or what? How about all that backend application and user data? Should a synchronous replication scheme be implemented? Should we go active/active or active/passive?
So many questions and so little time. I’ll tell you what: you can either hire our Consulting team to figure this out for you … or you can wait for my follow-up articles on these topics (just note that it took me 2 years to write this follow-up). 😉
The Future of FMA and Zones
One last thing: the future looks bright in terms of Site Design and what we’re doing with FMA. We’re working on some interesting things for XenApp/XenDesktop. I can’t comment on the specifics or timing of course, but I can say that we’re going to make this multi-data center stuff even easier for our customers in the future. We’re working on core protocol enhancements to make the architecture more resilient at higher latencies. After all, we want to be able to extend this stuff to the cloud one day, as well. Our Desktops & Apps team is talking more and more to our Workspace Services/Cloud team. If you’re familiar with the CWC architecture and what we’re doing there, then you can probably use your imagination for what the future might hold in terms of the FMA architecture.
Thanks for reading and I hope this helps in your travels. And if you’ve happened to implement a single site spanning more than one data center, I’d love to hear from you in the comments (and what some of your network specifics/setup entails).
Nick Rintalan, Lead Architect & Director, Citrix Consulting Services (CCS)