As you may have guessed based on the title, the focus of this post is how the Citrix XenMobile solution can be designed to accommodate a large user population. Before we get too far into that, let’s clarify. We are not going to be discussing the scalability of technologies that XenMobile interfaces with such as XenApp or XenDesktop. We are also not going to discuss NetScaler or ShareFile scalability in the context of XenMobile, though that may be a topic for a follow-up post. For those of you integrating your XenMobile deployment with StoreFront, our StoreFront Planning Guide (pg. 12) has some good data on how StoreFront scales. You can also check out my previous post if you are looking from more information on how StoreFront integrates with XenMobile.
What we are going to review in this post is how the primary components of the XenMobile solution, namely XenMobile AppController (XAC) and XenMobile Device Manager (XDM), can be architected and scaled to service a large user community. Specifically we are going to examine some of the Reference Architecture numbers for these components and what those numbers really mean when we are planning our deployment.
As a force of habit, a few housekeeping items:
All architecture and scalability numbers below will assume that we are designing to sustain the failure of one component of each type (i.e 1 XDM server and 1 XAC appliance). It is also important to note that the numbers detailed are minimum requirements to be used as a starting point when designing a given environment. Each environment will have a unique workload profile. As such, resources may need to be adjusted based on the results of proactive monitoring.
XAC is capable of two different types of high availability. The first is an active/passive mode in which the primary node services all devices (unless it becomes unavailable, then the secondary node takes over). In this configuration, XAC can service approximately 8,500 – 10,000 devices. The first time I heard those figures I had two immediate questions:
- What does 8,500 or 10,000 devices mean? Are they all connected at once? Devices that might connect ever? If they are connected, what are they doing ? Downloading an MDX app? Just policy updates? mVPN?
- What do I do if there are more than 10,000 devices?
Let’s answer question 2 first. If there are more than 8,500 or 10,000 devices, that is when our second HA option comes into play. We need to cluster our XACs rather than leverage the conventional active/passive appliance failover mode. We won’t spend too much time in this post explaining how XAC clustering works, but the key things to know are that we have a cluster head that hosts the master database and cluster nodes that service user connections. We also need to make sure we can sustain a failure even if it is our cluster head. To do so, we combine both HA options. The cluster head should be placed in active/passive mode and we then add cluster nodes as needed to support our anticipated user count (don’t forget your N+1 ;))!
Now back to question 1… How are we defining ‘a device’ when we establish these baselines? In this case we are defining a device as connecting over a given 60 minute period and attempting one of the following tasks:
- Mobile application download
- Application enumeration
- HDX (ICA) application launch
- SaaS application launch or SAML token generation
- MDX policy fetch
- MDX encryption token retrieval
- Selective wipe / lock check
In a typical production deployment, users would be conducting a combination of these tasks over the course of a day, but just like a XenApp or XenDesktop deployment not everyone is going to be doing these things at the same time. Environment specific considerations such as authentication timeouts, the number of mobile applications deployed and XenApp/XenDesktop integration should be taken into account when mapping the above metrics to your specific environment.
In summary, if you are going to have more than 8,500 users connecting to your XAC environment each hour, you need to evaluate implementing clustering for XAC. When doing so, it is important to account for HA of the cluster head (active/passive) and N+1 availability for cluster nodes.
XenMobile Device Manager
XDM scales a bit differently. The critical metrics associated with XDM scalability are CPU architecture, memory availability and thread count. There is some great info about this in our Reference Architecture, so if you have not yet read it, now is probably a good time to do so. At least take a look at the XDM section (pg. 8 ) so this will all make a bit more sense.
A typical HA XDM deployment consists of two XDM servers in a Tomcat cluster (not a Windows cluster). If your environment is going to be very large, you might even consider a third. To determine the resource requirements for each of those servers, the following formulas can be used:
Note: Each formula includes a variable ‘N’. N is calculated by taking your anticipated device count per server (remembering N + 1) and dividing by 5,000. If you are talking less than 5,000 devices, start with N=1. These metrics will vary somewhat depending on the CPU clock speed, processor architecture and physical vs. virtual as with most other workloads:
- Total CPU Cores = 2 * N
- Total Memory = 4 GB * N
Now that we have our basic resource metrics established, we can also do a few things on the Java side to make better use of the memory available. There are 3 key Java metrics to examine, which can be modified from the XDM “Edit Service” interface (screenshot below).
The first is the MaxPermSize which is the maximum Java memory limit for PermGen memory space. PermGen memory is typically occupied by ‘permanent’ data such as class data. This memory allocation is reserved at the OS level. Next is Initial Memory Pool, which corresponds to the initial Java heap size. Java heap size triggers Java garbage collection cycles which help mitigate the risk and impact of memory leaks and can increase efficiency. If heap size is too small, garbage collection will run more frequently than needed, which can negatively impact performance. Conversely, it heap size is too large garbage collection can be inefficient as heap memory has likely already become fragmented. We also do not want the heap size to be set equivalent to our third metric, the Maximum Memory Pool, because then garbage collection would only run once our heap memory is full. This can also really cost us on the performance side. The Maximum Memory Pool specifies the maximum memory that can be occupied by the Java heap. Because the heap could occupy the entire Maximum Memory Pool at any given time, we need to consider this memory space as reserved as well.
Now that we have established a good number of ways that we do not want to size the Initial Memory Pool, how big should it be? Well, the simple answer is “somewhere in between”. Below are some good starting points for each:
- MaxPermSize: 256 – 512 MB
- Initial Memory Pool: 25% of total physical memory
- Maximum Memory Pool: 65-75% of total physical memory
This leaves us with about 30% or so for the OS (remember that the Initial Memory Pool is a subset of the Maximum Memory Pool). Assuming we have put the SQL server role on a separate machine (which we should for a number of reasons) that should be about right.
Our last topic for discussion is tuning the MaxThreads for port 443 (Android enrollment) and port 8443 (iOS enrollment). This metric can be adjusted during install or from a configuration file after the fact. The MaxThreads metric corresponds to the maximum number of threads that will be opened for a specific port. Let’s say we have this metric set to 400 for port 443. If we have 50 connections, 51 threads (1 listening) will be open. Because Tomcat (the brains of XDM) is capable of multiplexing, this is not to say that if we have MaxThreads set to 400 we can only service 400 concurrent connections. Ideally we want this number to be close to the maximum number of devices that will connect to XDM concurrently. What that number is really depends on your device types and scheduling policies.
Now that we have discussed each component, let’s run through a quick example of what should be needed if we are talking 20,000 devices:
- XDM – 2 servers, 8 (v)CPU and 16 GB RAM each
- XAC – 1 active cluster head, 1 passive cluster head, 3-4 cluster nodes – 5-6 total XACs, 2 vCPU and 4 GB RAM each
Hopefully the Community finds the above information useful when designing and tuning their XenMobile Enterprise deployments. Feel free to drop me a comment or question below. Also a special thanks to my colleagues Jay Guasch, Raghunandan G and Nick Rintalan for their contributions.