As Neale Walsch points out in his quotation below – we find out things about ourselves once we approach the upper limits of our normal ways of working. Certainly, things become a lot more interesting up there.
As explained in a previous article, this is true whether you’re training and testing yourself against the mountains of the world, or studying the performance extremities of complex system software for virtualisation. When you’re pushing this hard; the training, conditioning and equipment you have – become a key component of how far you can go.
This is what we found when we tested the latest, publicly available version of XenServer. It’s called 6.5 SP1.
We found that several improvements we had made to the code in SP1, should allow us to scale higher [than the earlier 6.5 release] in the number of virtual machines (VMs) we could run simultaneously on a single server. Cool! But how much higher is higher?
Our previous measurements had shown that XenServer 6.5 was fit and able to start and run a maximum of 500 Windows VMs – and to have all of them remain responsive and performant for their end users. All the training we had now done for 6.5 SP1 (i.e. all of our theoretical understanding) indicated that we were now good to start up to 1000 VMs with this latest version.
How many of them would actually be able to run and remain responsive at such altitudes? We were super-excited to see how much further 6.5 SP1 could take us.
We found, when we’re up this high, that we started to see limitations in the server equipment itself. Much like a climber might have to use ultralight equipment, special ice-screws or stoves which can heat food at high-altitudes, we found that we needed to use machines with faster (10 Gbps) networking, faster storage (SSDs with Intellicache) and with larger numbers (120) of physical CPUs in order to see if we could find the new limits of XenServer itself.
The hardware we used for these tests was a Dell PowerEdge R920 with 4x Xeon E7-4890 v2 cpus (120 pcpus), 1TB RAM, 2x SSDs MZ-7WD400E 400GB ENTERPRISE Samsung in RAID-0 on a LSI MegaRAID SAS-3 3108 Invader Controller and a 10Gbps NIC connected to a 10Gbps NFS filer.
Results of all that training
The results came in– and are shown in the following graph–where the x-axis is the number of VMs in the host and the y-axis is the number of those VMs which are responsive to the needs of a typical knowledge worker (email, browsing, desktop applications etc.). This is the LoginVSI score achieved. The green line is XS 6.5 and the red line is XS 6.5 SP1.
From this, I am very happy to say that 6.5 SP1 is able to run an astonishing 20% more responsive VMs than 6.5 using the LoginVSI 3.5 workload. It gives a LoginVSI max score of 600 out of its maximum 1000 Windows7 VMs which we can run on this host.
What’s really interesting, much like Neale Walsch’s quote, is that as we reached the limit around 600 VMs running the LoginVSI benchmark, the degradation was “soft” i.e. the line didn’t immediately snap down to a lower value (or zero) as we could expect when the system stops working. This would be equivalent to collapsing on the summit. Instead, we see a gentle change in the gradient, which is due to the hardware bottleneck, arising due to the 120 physical CPUs which are available in the host. This means that XenServer reached this peak quite happily, and had more in the tank, if we set it a bigger mountain to climb.
With 600 LoginVSI VM sessions, the server CPUs were all constantly being used at 100%. This generally means that it’s not XenServer which was tired or worn out by this stage – or at a fundamental limit (outside of the usual small CPU overheads of any virtualisation platform). Instead 6.5 SP1 showed no scalability or I/O bottlenecks in these tests. The only bottleneck is the host hardware itself, i.e. the size of the mountain we had asked it to climb. Hence, by increasing the challenge, e.g. by improving the hardware, we should be able to get even better LoginVSI scores in this release of XenServer.
We’ve shown that we can run 1000 real VMs on a single host, and that for desktop workloads like LoginVSI, 600 of these can run comfortably with great responsiveness and performance on the server we were using. With a bigger server, we fully expect XenServer to be able to run the full 1000 with the same responsiveness for full desktop workloads like LoginVSI.
Where will this take us?
Mountains are all around us. Much like a youngster, starting with local hills and crags, who then moves to the Alps or the Rockies to further test their mettle; further training leads to testing oneself against the higher ranges of the Himalayas. For XenServer, this means working hard again on new versions of the code (new alpha and beta versions are available on xenserver.org), which will further increase its capabilities. It also means improving our equipment – i.e. upgrading our workloads to the latest LoginVSI (4) benchmark suite and running our tests on bigger servers with more powerful and greater numbers of CPUs.
Then, and only then, will we be able to see whether we have truly understood Hillary’s answer and conquered ourselves, not just the mountain.