Turbo Charging XenServer Backups

I’m a car enthusiast — mostly for European small hatches (don’t judge!) — and also a big fan of newer/better technologies, so when it comes time to buy a new car, I tend to go focus on two things: smaller and faster engines. Keeping with the engine “downsizing” trend, adding a turbocharger is the way to go.

My love of cars notwithstanding, that entire passage was only necessary to cover up the fact that I’ve appropriated the title of this post from Miguel Contreras (Turbo Charging your IOPS with the new PVS Cache in RAM with Disk Overflow Feature!) and Nick Rintalan (Turbo Charging Performance with PVS 7.7). So, we’ve already turbocharged several things around here and, so I said to myself, “Why not keep up with the trend, right?”

I’ve stumbled with the fact that our exports for XenServer backups are far from being ideal both from the time they take to complete and the space they take on the backup drive. Yes, you can compress them, but… it takes forever. Yes, you can use the better xe vdi-export techniques on your backup scripts, but then you have to compress them on the destination and also, that gets you “just” the VHD file without the Virtual Machine parameters (CPU/RAM/NICs), which just add time and complexity to the backup/restore strategy.

What if I tell you that it is in fact possible to run a full XVA export with all the VM information at the same (or even faster!) speeds from the newer vdi-export method but also compressed at the same(-ish) size of the glacier slow XVA Compressed export?

Yes, we can! Bear with me for a few minutes and I’ll show you how!

Let’s define the backup options

There are countless of posts around the web in regards of backups methods and/or scripts in XenServer, but most (if not all) of them come down to just two underlying commands:

xe vm-export: This is the traditional way of exporting VMs, snapshots, etc. When you do an export from XenCenter, this is the command that the server will issue. The good thing about this command is that:
1. it will export used blocks only from you VHD files. So, if you’ve been doing your cleanup job correctly from within the Virtual Machine, you should have smaller files than your total VHD space actually can hold. In my examples below, you’ll see a full fat 32GB VHD files with ~20GB worth of operating system data, configurations, and applications files.
2. Also, this will export the full XVA, which makes it easier to import in a recovery scenario. The bad thing is that it takes a while to complete. It has an acceptable speed though (YMMV depending on hardware/networking config, of course).
xe vdi-export: This is the newer API supported by the XEN project, which works directly with the VHD; it’s basically the main player on Storage XenMotion and it’s been around for quite a few years now. This one is blazing fast for copying VHD files
.
The good thing about this one is, again, speed. You can easily saturate your storage IO before and if you have a pretty good storage, you can get near wire-speed for fast networks. The bad things are that:
1. If a VM has multiple VHDs, you’ll have to export them all individually.
2. The resulting size will be the full fat VHD size (in my test case, 32GB then). So, if saving that through the network, you’ll have extra transmission time too.
3. Finally, it won’t save the VM parameters so saving the XenServer metadata should be done too, in order to keep all the machine information (CPUs/RAM/NICs, etc.).

One of the backup scripts that I like the most is NAUbackup VM script from Northern Arizona University, which exploits both export methods. For instance, on a VM with two VHDs — the OS one 32GB and the data one 1TB — exporting the whole VM might be overkill so you’ll be better off executing vdi-export for the OS VHD and leave the 1TB disk alone so backups can be taken from inside the VM (Mail Server, SQL Servers, File Servers, etc.).

Let’s see how our playground looks like and our baseline numbers

Let me be specific with my test server so numbers make sense. I’m running XenServer 7.2 on a 6 years old Dual Xeon X5675 server with 128GB of RAM. Local storage repository consists of 3x450GB 10K RPM SAS disks configured in RAID 0 for optimal performance and a quick hdparm -t /dev/sda shows a buffered transfer rate of ~400MBytes/sec. My Test VM is a Windows 2012 R2 VM with 4GB of RAM 4vCPUs and 32GB VHD.

Backups are taken over a single 1Gbps link using a home-grade switch to an external machine running SATA Disks at 7200rpm with a maximum write speed of 110MByte/sec and an average of 80MByte/sec, so home-grade equipment.

I’ve first exported using the regular commands/tools, for a total size of ~19.5GB and I wanted to save space so I went ahead and used the compress=true modifier which was able to cut the size in half for a ~9.6GB:

Also, just for fun, I’ve exported the full size VHD with vdi-export so we also have a reference. Below is a nice graph illustrating the results:

Commands used were:
xe vm-export vm={uuid_of_vm_or_snapshot} filename=/path_to_store/filename_compressed.xva compress=true
xe vm-export vm={uuid_of_vm_or_snapshot} filename=/path_to_store/filename.xva
xe vdi-export uuid={vhd_uuid} filename=/path_to_store/filename.vhd

As you can see, the regular export took a decent 6 minutes to complete, using 19.9GB on the destination.

The vdi-export used the full 32GB and took longer just because it was a bigger file and, in my home network, struggled to get a higher bandwidth than the regular vm-export.

Finally, the compressed vm-export took a whopping 17 mins and 30 seconds to complete for a 9.64GB resulting file. For me this was like way too much.

Investigation starts

I wanted to see what’s going on there and why I was not able to get faster compressed results. I understand that my source and destination Storage IO is far from ideal, so write speeds between 50/70 MBytes/sec on the backup server for both vdi-export and vm-export with no compression seemed logical. But about 6Mbyte/sec for compressed? Not good. It wasn’t very complicated to spot the culprit here to be honest though. A quick “top” command on dom0 showed…

GZip is having 100% cpu utilization in dom0. That’s Linux language for “I’m using one full core and ignoring the other 15 that Dom0 has (my default XenServer 7.2 installation assigned 16vCPUs to Dom0). So this MUST be an issue, right? Correct! and a quick google around showed that gzip won’t multi-thread so? Where to now?

Meet PIGZ

Parallel Implementation of GZip for modern multi-core machines (or “PIGZ,” like the author calls it). It’s basically gzip with support of multiple cores.
Install goes as any other linux RPM package. You can get the different pre-compiled RPM binaries from here, and here is the full command I run on Dom0 which will download and install pigz:

wget http://mirror.centos.org/centos/7/extras/x86_64/Packages/pigz-2.3.3-1.el7.centos.x86_64.rpm && rpm -ivh pigz-2.3.3-1.el7.centos.x86_64.rpm

Now, let’s export a plain XVA file, but, at the same time, let’s pipe it’s output to a compressor and export that to a file on a storage repository:

xe vm-export vm={uuid_of_vm_or_snapshot} filename= | pigz -c >/path_to_store/filename.xva.gz

Let’s check the results and compare them with the previous one:

Results:
Total destination size: 9.63GB
Total time elapsed: 3 minutes 37 seconds!

That’s impressive! A few notes though:

Is it too good to be true? No! It takes half of the time of an uncompressed just because it has to transfer LESS bytes through the network in my environment.
CPU Utilization in Dom0 went from less than 4% (with normal compress method) to ~40%. Of course, compress is a taxing activity for the CPU and doing this faster makes Dom0 to work harder but always keeping room for other critical processes (xapi, v-switch, running VMs, tapdisk, etc, etc.).
Per Tobias Kreidl suggestion (please read our conversation in the forums), I’ve also tested eliminating the network and destination storage from the equation by piping results to /dev/null ( that’s *nix wording for saying “send that to limbo!”) I still see about 50% faster speeds using pigz from regular gzip.
Also, the pigz method was within ~10% of the time it takes to export an uncompressed XVA when piping to /dev/null (2m 47 secs uncompressed to null while 3 mins 4secs pigz to null) on my test environment. This shows how efficient this is.
Again, your mileage may vary. If you have powerful last-gen Xeon’s with blazing fast PCI-e SSDs on both end, of course results can be night and day from mine.

Conclusion

This is NOT FREE performance; beware of that. This comes from the expense of Dom0’s CPU time. If you are exporting backups in the middle of the day, then expect to have some impact on the remaining VMs that are running on that given XenServer host. Also consider network bandwidth on where you are saving backups. If you’re today extracting backups uncompressed, it should be more or less the same utilization, but if you’re doing the compressed way, expect higher utilization on that link, which will justify a dedicated backup network in some cases.

Example of CPU Utilization during a PIGZ compressed backup

Please be sure to give this compression method a try and comment down below with your own results. Also, I would like to hear if the higher CPU Utilization would be a big deal for your individual cases or not.

Aldo Zampatti
Principal Consultant, Citrix Consulting Services

Citrix TechBytes – Created by Citrix Experts, made for Citrix Technologists! Learn from passionate Citrix Experts and gain technical insights into the latest Citrix Technologies.

Click here for more TechBytes and subscribe.

Want specific TechBytes? Let us know! tech-content-feedback@citrix.com

Topics

Products