UPDATE (April 2014) – My colleague and I recently published an article with updated info on NetScaler-specific options. And this article is hot off the press with the all-new NetScaler 10.1 option, so check it out!
In order for Provisioning Server (PVS) to boot a target device, a “bootstrap file” needs to be made available when the device is turned on. The bootstrap file in this case is called “ARDBP32.BIN” (the Ardence Boot Program…gotta love our roots!). This bootstrap file contains the information needed to communicate with PVS so it can initialize the streaming process and ultimately mount the vDisk. The default mechanism for delivering this small bootstrap file to a target device is by downloading it via TFTP. But are there other options? How do you go about making TFTP highly available if you use DHCP options instead of PXE? Do you absolutely need a hardware load balancer like a NetScaler or F5? I’m going to attempt to answer these questions and more in this article.
Another reason I wanted to write this article is because I’ve been hearing a lot of people blindly recommend to “Use NetScaler to load balance TFTP” lately. I’ve also seen many consultants and engineers fall flat on their faces when they try to make TFTP highly available with a hardware load balancer. This might sound easy in practice but it’s really not. Hopefully after you’ve read this blog you’ll understand why load balancing TFTP is anything but trivial and what other options might exist when it comes to delivering that precious Ardence bootstrap file.
Before I begin (in true Consultant fashion), I want to quickly run down what’s not in this article. I’m not going to talk about how to implement each of these options in detail. I’m also not going to talk about the other things required in order to make PVS highly available – because in my mind, making this aspect of PVS highly available is the trickiest part. And lastly, I’m also not going to discuss every option in the world for delivering a bootstrap file or making TFTP highly available…believe it or not, there are even more options than the ones I’ve listed below and there are variations of each of these! But the options below are the most common ones that I’ve actually seen customers implement in the real world.
What follows is a list of the most common options for delivering the bootstrap and the pros and cons of each option so you can ensure your next PVS implementation is highly available from start to finish.
1. DHCP Option 66 with a Single PVS/TFTP IP. Sadly, this is probably the most common thing I see out there. Since most DHCP servers will only accept a single entry for option 66 (and/or most PXE clients are unable to interpret multiple entries in option 66), some people just put the IP address or hostname of their first PVS box and call it a day (note I’m making an assumption that you are using the built-in TFTP server that ships with PVS (BNTFTP.exe) so your PVS server is your TFTP server and vice-versa). There really aren’t a whole lot of advantages to this method…sure it’s easy and free. But the disadvantages (you now have a glaring single point of failure in your environment!) really outweigh anything good that comes out of this option, so you really should never do this. Especially since the next option is just as easy to configure and it at least gives you some level of fault tolerance.
2. DHCP Option 66 with DNS Round Robin. This is really the same thing as the first option, but instead of putting a single IP address or hostname corresponding to a single PVS box, you simply put a hostname that has multiple A (host) records in DNS. By default, DNS will rotate the resource records (the PVS servers in our case) in a “round robin” fashion. In other words, the first target device to boot will likely get the first PVS IP once it receives option 66 from DHCP and the second target device will receive the second PVS IP. The advantages of this option include simple configuration and it provides a basic level of fault tolerance. There are disadvantages of using DNS RR and many of them are discussed widely on the internet so utilize your Google skills if this is a new concept to you. But I’ll just say that DNS RR has “limited intelligence” and it’s definitely not as good as something like a NS which can provide true HA and intelligent load balancing.
3. DHCP Option 66 with Multiple Entries. If you happen to be using a DHCP server that accepts multiple entries in option 66, then you can simply list both PVS/TFTP IP addresses separated by a semicolon and call it a day. The other catch is the client needs to be able to interpret that semicolon and/or multiple entries in that DHCP option. I’ve only seen this work in a couple cases and I’ve seen probably 50 PVS deployments. And I’ve never seen anyone get it to work in a Microsoft DHCP implementation with VMs hosted on XenServer. So while this option provides high availability and is very easy to configure, it’s just not feasible in most scenarios. Unfortunately, I can’t tell you which customers I’ve seen this work, but my advice is to just test it and see if it works if you have a non-Microsoft DHCP server/appliance.
4. Proxy DHCP (Citrix PVS PXE Service). Instead of using DHCP options 66 and 67, simply use the built-in PXE server that ships with PVS. I think a lot of people out there just don’t realize that you don’t actually need to use DHCP options! As I mentioned above, we provide a PXE service (BNPXE.exe) you can enable when you initially configure PVS. And by using this method, when a target device boots and sends out a broadcast message to find a suitable PXE server, one of the PVS/PXE servers could be down and the other will respond and ultimately provide the bootstrap. So you get high availability with this approach and you don’t even need to bother the DHCP/DNS admins to add those options. The downside of this approach is the PVS servers have to be in the same broadcast domain as the target devices but that’s usually how you’d want to set it up anyway. Of course you could potentially configure IP Helper to route those Proxy DHCP/PXE broadcast packets to remote broadcast domains but I don’t see customers doing that very often. The other potential downside to this approach is there could be other PXE services already on the network (Altiris, etc.). But I really don’t see this come up that often in the enterprise space since the networks aren’t “flat” and tend to be designed pretty well using subnetting best practices. So this is a really good option in my opinion with limited drawbacks, assuming a proper L2/L3 network design.
5. NetScaler with USIP Address Mode. Now let’s say we have a NetScaler in the mix (and really this goes for any hardware load balancer)…this is where things start to get interesting. The easy part is configuring a VIP on the NS that corresponds to your two TFTP/PVS servers and putting that VIP in DHCP option 66 (again I’m assuming you have PVS and TFTP collocated). The hard part comes after that. If you are using a default NetScaler install/config, then any client (like a TFTP/PVS box in our case) is sent the Mapped IP or Subnet IP address (MIP or SNIP) instead of the actual client IP address. This presents a problem for TFTP since it requires the actual client IP to communicate properly. Luckily we can configure the NetScaler in something called “USIP Address Mode” (Use Source IP) which effectively does exactly what we need for TFTP – instead of sending back the MIP or SNIP, the NetScaler will send back the actual source/client IP. But now that we are using USIP, we need to change the default gateway of the TFTP servers to an IP on the NS (the MIP or SNIP) so they can communicate properly. Sounds easy enough…but what did we just do? Since we are using the TFTP service that comes with PVS and they are on the same box, we effectively just routed all PVS traffic through the NS – including our precious vDisk streaming traffic! So while this method allows for intelligent load balancing and true high availability of the TFTP service, I can hardly recommend it since it means changing the default gateways on our critical TFTP/PVS boxes (not to mention the setup is more complicated than any of the previous options we’ve discussed). So in the famous words of MarkT, “there has to be a better way!” Let’s see about that…
6. NetScaler with DSR. Explaining how the Direct Server Return (DSR) feature works on the NS is a bit out of scope for this blog, but essentially responses from backend servers are sent directly to the requesting client bypassing the NetScaler. It requires a few things to be enabled on the NS that are discussed here. My colleague based out of Germany, Holger Füßler, has also done an outstanding job discussing DSR in his blog here, and specifically how to use DSR to make TFTP highly available for PVS, so I’m really not going to get into it here. But this approach certainly provides high availability of the TFTP service and it has the added advantage that you don’t need to change the default gateway of your TFTP servers. But it comes with drawbacks like anything else – it requires the TFTP servers to be located on the same subnet as the NetScaler appliances, it requires the creation of a loopback adapter with the IP of the load balanced vserver and it’s certainly more complicated than something like DNS RR or Proxy DHCP discussed earlier.
7. NetScaler with SolarWinds. This approach involves disabling or not using the built-in TFTP service on the PVS boxes and standing up two additional VMs with something like SolarWinds. There are many TFTP servers out there, but I’ve seen a few customers use SolarWinds with success – it’s free, multi-threaded and pretty robust. Now we’ve essentially “offloaded” our TFTP service and can serve the PVS bootstrap file from something other than the PVS boxes. If we configure our NetScalers with USIP, we don’t need to touch the default gateways on our PVS boxes – we only need to change the default gateways on our two VMs running SolarWinds. So this has the added benefit of not routing our stream/vDisk traffic through our lovely NetScaler appliances. We also get intelligent load balancing and HA of TFTP. But now we’ve introduced two additional servers in our environment and we are using freeware. Some customers just don’t go for this, but this is a very nice option in my mind if you’re hell bent on getting true high availability from a hardware load balancer and don’t mind spinning up a couple more VMs.
8. BDM. This brings us to our last option – Boot Device Manager (BDM). And I’ve been saving this for last because it’s quite different from the rest and one of my personal favorites. Instead of downloading the bootstrap file from a TFTP server and booting our target devices from the network, the BDM utility provides an optional method for delivering boot information to target devices. You essentially run through a quick wizard and spit out a 300k ISO file courtesy of the “Citrix ISO Image Recorder” – then you configure the target devices to boot from the CD/DVD-ROM and specify the location of your ISO on your favorite shared storage repository. And by booting from this small ISO, we’ve effectively bypassed the need for DHCP options, PXE and TFTP. All of the sudden we don’t care so much about making TFTP highly available because we don’t even need it with BDM! We still need DHCP itself to provide IP address information (otherwise we’d have to create an ISO file for each target device which would be painful), but we are now essentially providing our target devices with a “local” bootstrap file (it’s not technically local but I like to think of it that way). This makes BDM even a bit faster than waiting 5-10 seconds for PXE and TFTP to do its thing. But what are the downsides of BDM? BDM gets tricky if your target devices are physical as opposed to virtual. But I really only come across this in the XenApp space when customers are using bare-metal as opposed to virtualization products like XenServer, vSphere and Hyper-V. The other time I’ve run across this is when you’re using PVS for Desktops (no XenDesktop in the mix – just PVS with physical desktops). But 95% of the time these days, customers are deploying virtual XA and XD workloads on some type of hypervisor so BDM works like a charm! And even in those physical situations, you can use BDM to “burn” the PVS boot information to something other than a CD, DVD or ISO – you can actually carve up a hard drive and make one a BDM partition and burn the boot info to that partition! Pretty cool, but it does require some work so I’d probably consider something simple like DNS RR or Proxy DHCP before I went down that road.
Well there you have it – I’ve detailed 8 of the most popular options that people are using out there when it comes to delivering the PVS bootstrap file and making it highly available. I typically “pitch” BDM to my customers first and then fallback to PVS PXE (Proxy DHCP) and DNS RR if I want to keep it simple or if I don’t have a hardware load balancer. But it all comes down to what you have at your disposal, the network setup and the customer’s requirements (is true high availabilitya requirement or will simple fault tolerance suffice?). You need to analyze the situation and present the best option that meets your customer’s specific business requirements. Hopefully now you understand that using NetScaler to load balance TFTP isn’t that trivial after all (oh the irony…if only anyone out there knew what TFTP actually stood for). 😉
Let me know if you have any questions or feedback. And the next time someone asks you what options are available to deliver a PVS bootstrap or make TFTP highly available, hopefully you’ll be able to rattle off about 10 options with the pros and cons of each approach!