There is always something a bit surreal about blog posts on the Internet; more so when older ones come back around.  A case in point is this one from Sysconfig’s blog which was recently dredged up on twitter.  In it he lists a series of things which he’s had to contend with, and details some of the steps needed to resolve them and what lead to his ultimate decision to migrate to Xen.

I freely admit that there are some pitfalls which could create problems for administrators, but as with many articles out there this one lacks one very important detail; the product version isn’t called out front and center leaving the reader to guess if it applies to them.

As I sit here today, the current shipping version of XenServer is 6.0.2 and I’ve just been on two XenServer Master Classes with well over a thousand attendees where we are talking about the upcoming XenServer Tampa release.  Going by the date in the blog post, and by some comments, I would guess that the author was running 5.6 but that some of the issues experienced related to 5.5.

For those of you wondering about the release dates, 5.5 was released in June of 2009 and 5.6 in May of 2010.  If we look at some of the items listed, we can see the progress we’ve made in improving XenServer:

NIC replacement.  Yup, this is a problem, but will be fixed in Tampa.  The core issue is that in our attempts to ensure that race conditions in the Linux enumeration of devices didn’t have management interfaces moving around and bonds breaking, we didn’t take care of the case of NIC replacement.  That’s now been resolved; hopefully once and for all.  Oh and the solution also works for cases where a backup of XenServer is restored on a server where no hardware is common from the original server.
  • Pool member network settings get hosed.  I’m glad to say that this was a problem, but with 6.0.2 has been resolved using the Emergency Network Reset feature.  I’ve personally used this a couple of times and can say that it has saved hours of reconfiguration that used to be required.  It’s particularly useful if you are using local storage for VMs since reinstalling XenServer probably isn’t a great option for those local VMs.
  • Space reclamation.  This used to be a big problem in the XenServer 5.x days, but ever since XenServer 6.0 we do “online leaf coalescence” which is what makes VM Protection and Recovery an effective solution for basic backup requirements.  While some storage issues won’t be fully addressed within Tampa, we are laying the ground work to ensure they too are a thing of the past in very short order.
  • Modified network bridge code.  We are guilty of doing that, but since XenServer 6.0 we now default to using the Open Virtual Switch, so our network stack is not only more powerful and flexible, but we are basing all our future network capabilities around what ovs offers out of the box.  A perfect example of this is the upcoming Tampa support for LACP and bonds containing up to 4 NICs.  Both of those are features supported by the Distributed Virtual Switch (ovs) in XenServer and not implemented in the older bridge code.

In the end I welcome blog entries like this one from Sysconfig.  He clearly states “XenServer is solid and very easy to setup and use” and that “It runs and runs and runs”, but also highlights that for the versions in question; “if something unexpected happens, you are seriously screwed”.

We’ve been working hard to improve the supportability of XenServer and have made significant strides in the past few releases.  We understand that it’s not enough to build a world class hypervisor which is running millions of mission critical workloads in hundreds of thousands of organizations globally; we also need to ensure that the “10 Minutes to Xen” philosophy also extends to ease of management and maintenance.