One of the things I talked about in my BriForum presentation a couple weeks ago in Chicago was dom0 tuning. Not only did I get quite a few inquiries after my presentation specifically about this topic, but I also get a lot of questions from fellow field consultants, partners and customers about XenServer scalability in general. So I wanted to take a few minutes today to offer up some (hopefully) valuable tricks to maximize VM density on those XenServer hosts you’re running.
Before I begin, I need to state that I’m going to be primarily focusing on single server scalability in this article. If you’re looking for guidance around resource pool size or how to carve up shared storage for use with XenServer, then you’re not going to find it here. I’m going to be talking specifically about tuning XenServer’s infamous “control domain” so that you can push a single XenServer host to the limit and make some sparks fly in your data center. I should also note that the following tweaks are not always recommended! More often than not, the default settings will suffice and XenServer scales pretty darn well out of the box. But if you’re hitting walls with ample CPU and memory available on your host, or coming across a scalability bottleneck you can’t seem to find, then these tweaks might warrant further investigation and testing. I’ve had to use one of more of these optimizations in the past when designing or implementing XenServer at large customers. So without further ado…
When a host running XenServer starts up, the Xen hypervisor loads a small virtual machine that is essentially invisible to users (and even admins if you don’t know where to look!). This special, privileged VM is referred to as the “control domain”, “domain-0” or, as the cool kids like to call it, “dom0”. The control domain runs the management tool stack and also provides low-level services to other VMs, such as providing physical access to devices. And prior to XenServer version 5.6 FP1, dom0 was limited to a single virtual CPU (vCPU-0). This presented some issues in terms of single server scalability and problems really manifested themselves once we started putting more and more guest VMs on the host. There is no “magic number” in terms of the number of guest VMs that a single host could sustain before that single vCPU allocated to dom0 was pegged, but I’ve seen problems on more than a couple occasions once more than 60 or 70 VMs are fired up. Now with that being said, I’ve also seen single XS hosts scale to almost 100 VMs without any issues or tweaks. But beginning with version 5.6 FP1, we made a change and allocated four (4) virtual CPUs to dom0. This is enabled by default and has been a tremendous help in terms of single server scalability. And 99% of the time, you won’t need to allocate more vCPUs to dom0. However, in the rare case that you see dom0 becoming starved of CPU resources (with a tool such as xentop), you can actually allocate eight (8) vCPUs to dom0 instead by following this article. Interestingly enough, dom0 is actually slated 8 vCPUs at boot time but we have a special service called “unplug-vcpus” that reduces the number of vCPUs from 8 to 4. While this is adjustable, I’ve only had to do this one time and this environment was “special” to say the least. So I’d recommend leaving the default value of 4 vCPUs assigned to dom0 and monitoring to see if it’s an issue…chances are, it won’t be an issue and you’re good to go. And if you’re not on FP1 by now, you better upgrade soon because those extra vCPUs really come in handy!
Now that we’ve got dom0 CPU squared away, let’s move onto memory. By default, dom0 is allocated exactly 752 MB of memory. This can also become a problem when you get into that 50+ VM territory. So we often recommend increasing the memory allocated to dom0 from 752 MB to 2.94 GB (and exactly that number – you don’t want to risk your XS deployment being unsupported!). Details on how to make this change are outlined in this article. If you’ve read any of our public XD or XS scalability whitepapers, you’ll notice that we almost always make this change. I’ve also gotten asked if you can assign even more memory to dom0 for whatever reason. It should be noted that dom0 is a 32-bit process so you really can’t assign too much more…I suppose you could assign closer to 4 GB but then your XS deployment would be unsupported by us (according to Tech Support) and your data center might blow up (that’s a Consultant’s way of saying it’s risky). So we highly recommend changing that value to exactly 2.94 GB as I mentioned earlier…the Xen guys are smart and have their reasons.
Also related to memory is the heap size. Before XenServer 5.6, we used a default, static heap size of 16 MB and you could change that value in the same file as where you tweaked the memory allocated to dom0 (/boot/extlinux.conf). But with a heap size of 16 MB, we started hitting artificial ceilings once we loaded up numerous VMs, so we’ve since changed that value to 24 MB. And it’s also important to note that we no longer expose that number in that file (so if you’re looking for it on a current XS implementation, you’re not going to find it!). But if you’re still running XS 5.0 or 5.5 for whatever reason, this might be something to look into. The new default heap size of 24 MB will work 99% of the time. But in that rare situation where you’re loading hundreds of VMs onto each of your XS hosts, you might need to tweak it ever so slightly. In general, setting the value of “xenheap_megabytes” to (12 + MAX-VMs/10) is a good rule to follow. So if you’ve got a monster box with say 512 GB RAM and you’re shooting for say 200 VMs on each host (each with ~2.5 GB RAM), then you might want to increase the heap size and statically set it to 32 MB. For more information on this somewhat legacy setting, please refer to this article. As with any of these tweaks, I can’t stress how important it is to test and validate these settings in a non-production environment before ever attempting to do any of this in production.
Now remember a couple minutes ago when I said that dom0 now leverages 4 vCPUs by default? Well, I kind of lied to you. It is technically capable of using 4 vCPUs, but by default, all network interrupt request queues (IRQs) are only processed by a single virtual CPU (vCPU-0)!!! And this little bottleneck can really wreak havoc in heavily utilized XenServer environments. I’ve actually seen a couple dom0 bottlenecks associated with IRQs in the last few months alone. Granted these were fairly sizable XenServer environments under heavy load, but I still think there are quite a few people out there that don’t know about this (and this problem can manifest itself in even smaller environments with frequent shift changes, for example). This limitation is documented in this short whitepaper (which is pretty well hidden…I know, I know) entitled “Achieving a fair distribution of the processing of guest network traffic over available physical CPUs”. I encourage everyone to read this brief whitepaper, specifically section 5 where it talks about interrupt queues. But the net-net is we have a potential problem…and in order to mitigate this risk, you can either manually assign interrupt queues to vCPUs or you can simply install irqbalance. Irqbalance is a lightweight Linux daemon developed by some smart folks at Intel that dynamically distributes interrupt queues to all available vCPUs (so in our case it essentially allows dom0 to leverage all four vCPUs for network interrupts which is key!). In each case I’ve seen this issue, we’ve recommended installing irqbalance on each XenServer host. It’s a piece of cake to install and it works like a charm. I should also note that we’ll likely (you can’t quote me…) include this handy daemon with all future releases of XenServer going forward.
I think that’s it for today – I could keep going on about the broader topic of “XenServer Scalability” but everything else is non-dom0 related for the most part (resource pool design, number of VMs per LVM-based LUN, jumbo frames, optimizing your servers for performance, etc.). But we at least covered all the basics related to dom0 tuning. With these tweaks, you just might be able to squeeze a few more VMs onto each of your hosts (or prevent disasters when scaling up)…and that could be a major cost savings in a large environment. Hopefully you learned something new today…and if you did, drop me a note below because I’d love to hear about it.
Senior Architect, Citrix Consulting