UPDATE (March 2014):  I originally wrote this article in December 2011 and there have been some interesting developments on the RSC front and Xen architecture (and I know some of you are aware of these already since you’ve reached out to me!).  But in case you haven’t heard or read my latest comments below this article, RSC is now disabled by default in xentools 6.2 and we recommend leaving it disabled in almost all circumstances.  This is different than some of the original guidance we originally provided (and I have updated my PPT finally to reflect this!).  But for large hosts with high density or lots of VMs, we are better off at this time allocating dom0 more vCPUs as opposed to enabling RSC.  We have found that disabling RSC results in a more balanced system overall, and is “safer” in terms of not starving individual netback threads.  We are still working to “fix” this issue, but we likely won’t fix it via RSC – it will likely be fixed with driver domains in the future Xen architecture, which is a much safer strategy to “extract” load from dom0.  Stay tuned on that front, but that is the important update I wanted to share since I’ve received lots of questions about this feature and some of the reports being generated by TaaS when RSC is enabled with lots of VMs and/or VIFs.

It’s been a couple months since my last article on PVD, but I promise I haven’t been slacking – I’m officially a father!  So I took a few weeks “off” (if you want to call it that), but now I’m back, better than ever and have something pretty cool I’d like to share with the community.

Over the last month or so, I’ve been working on a little side project with the XenServer Engineering team.  Thanks to my friend, Bill Carovano, I was lucky enough to get hooked up with one of our brightest XS engineers, Rok Strnisa.  If that name sounds familiar, it’s probably because you’ve come across his famous “Network Throughput Guide” on the Xen Wiki.  If you haven’t read that guide, put it on your “to do” list and read it over the holidays when you have a chance…I can’t recommend it enough.

Anyway, as many of you know, we released “Project Boston” (XS 6.0) a few months back.  But what many people might not know is we made a couple significant changes under the hood with XS 6.0.  Specifically, we changed the networking backend from “bridge” (Linux) to OVS (Open vSwitch).  And we implemented a new feature called Receive-Side Copy (RSC), which effectively dis-aggregates XenServer’s control domain by doing more work (i.e. copying) within the guest, thereby freeing up critical CPU resources within dom0.  While the bridge to OVS change is certainly important, the RSC feature is also important in terms of the potential impact it can have on XenServer’s “scalability” from a networking perspective.

And that’s precisely where Rok comes in…Rok completed some very detailed performance & scalability tests a few weeks ago to look at these important changes we made under the hood.  And I’ve spent the last couple weeks interpreting the results, documenting the results and even coming up with some new best practices around how we might virtualize workloads like PVS on XS6.  And lucky for you, we’re ready to share these fascinating test results and new best practices for the first time…I give you “How to Tweak the $!@# Out of XenServer“:

That’s right – I’m giving you a raw PowerPoint presentation I put together that summarizes the tests that Rok and his colleagues conducted.  In addition to Rok’s analysis, I’ve included some of my own analysis and “gotchas” at the end of the presentation.  So if you’re wondering any of the following (like I was), then click on the link, download the ~5 MB presentation and check it out:

  • What is the scalability impact of changing the networking backend from bridge to OVS?
  • I still don’t understand what RSC is or does from your description…can you explain?
  • Is RSC enabled or disabled by default in XS 6.0?  How about XS 6.2?
  • Which scenarios might enabling RSC make sense (XA, XD, virtualizing PVS, etc.)?
  • What impact does RSC have on domU (not to be confused with dom0) scalability?
  • Are we now able to saturate a 10 Gb link with XS 6.0?  Is this with or without SR-IOV?
  • Which of the 4 “scenarios” tested yielded the best results (RSC on OVS, RSC on bridge, no RSC on OVS, no RSC on bridge)?

One thing I’d like to point out – make sure to read the slide notes!  There is some really good information in the notes that I want to make sure you don’t miss…and in some cases, like when you see an asterisk (*) on a slide, I explain what that means in the notes.

Other than that, I hope you find this information valuable.  Let us know if you have any feedback (good or bad) by leaving a comment below.  And if you have any questions, please feel free to ask – I bet I won’t be able to answer everything, but that’s why I have friends who write code in Cambridge, UK. 😉

Cheers, Nick

Nick Rintalan, Senior Architect

Americas Consulting, Citrix Consulting