In the IT world we should all be aware of what happens if you lose a datacenter, from both business and technical perspectives this possibility can leave us vulnerable to costly outage or even failure of the business itself! There can be many reasons for this to happen. Mother nature and human error show us time and again that a number of different unplanned events can hit your datacenter.
With XenServer integrated Site Recovery we provide you with a tool that can help you to get your Servers, virtual Desktops, Databases etc. back online.
How does this XenServer feature work?
First of all you need a shared storage system which is capable of replicating data from Datacenter A to Datacenter B. It is your decision how often you want the data to be replicated, and this is usually governed by specific RPO/RTO calculations for your business. Distance doesn’t really matter, like with NetApp Snapmirror you can drive replication hundreds of miles and by leveraging Citrix CloudBridge we can even help you with compressing any data which needs to be transferred over WAN!
After you setup your shared Storage Repository (SR) in your primary datacenter you can configure this SR for Disaster Recovery, as shown in the graphic below. SR’s used for the Disaster Recover feature can be attached via iSCSI or Fiber Channel but cannot be configured if your shared storage uses integrated StorageLink or NFS.
This process creates a Metadata Backup of the VMs on the selected SR.
Once the DR configuration is made, the Metadata Backup on the selected SR is always kept up to date with information about the resource pool and virtual machines.
Part of the DR planning process will involve deciding which workloads should be prioritised for recovery in the event of a disaster. This list is typically a small subset of the total workloads hosted and will affect the design of your storage tier. For example, you may decide that only 25 of your most critical services need to be available at the recovery site and that these can provide a lower level of service during a DR event. This decision may influence your DR design to provide just 2 or 3 ‘DR SR’s’ at the primary site running on tier 1 storage and have these replicated to a lower tier of commodity storage at the recovery site. Many factors will affect the DR design process, but identification of the workloads for recovery and their respective priorities is a key factor.
The XenServer resources pool(s) at the DR site could be installed and configured when you need them, but ideally they should be production XenServers which are online and ready to be used when required to avoid delays in any recovery process. This choice will also affected by your desired RPO/RTO . If your recovery times are aggressive, the XenServer resource pools at the recovery site should be subject to the same rigorous management, monitoring and change control processes which are in place for the production site. This will ensure that the state of the DR infrastructure is well known and will help to avoid complications or delays during recovery.
Important factors to consider during DR configuration are that XenServer version patch levels and CPU family are identical, and sufficient resources are available at the recovery site to host the workloads prioritised for DR . The resources available at the recovery site should be sufficient to meet the performance criteria laid down in any business or technical SLA’s which are in place, it is possible that these may be relaxed during a DR scenario, but your customers need to be made aware of any differences!
The day of truth !! – We hope it never comes to this, but the day arrives where you need to failover to your recovery datacenter.
First you need to enable your DR-Storage by making it writable. Make sure LUN mapping and zoning is correct. As second step you launch the iDR-Wizard in XenCenter.
The wizard will probe the selected storage for the required Storage Repositories, you then choose the SR’s which will be attached to the Disaster Site. XenServer will read the previously saved Metadata from the SR and ask you to confirm the names of the VMs you want to restore.
XenServer will now attach the SR, create the VMs based upon the VM meta-data previously saved and and start the VMs you selected.
You can also use the Disaster Recovery wizard to run test failovers for non-disruptive testing of your disaster recovery system. In a test failover, all the steps are the same as for failover, but the VMs are not started up after they have been recovered to the DR site, and cleanup is performed when the test is finished to remove all VMs and storage recreated on the DR site.
Failback is achieved in pretty much the same way. You setup your storage replication and configure the SRs for Disaster Recovery. When you feel there is good time to make a cut, you can start the failback process by starting Disaster Recovery Wizard. You choose failback, probe for your SRs, choose the VMs to be restored, and done.
The only thing left is to clean up your DR site (forget SRs and delete VMs).
All of this can be achieved using the CLI as well. This Citrix knowledge base article, although specific to XenServer 5.x, contains some useful information regarding the use of the CLI to script the backup of pool and virtual machine meta-data.
All of this functionality is provided with XenServer Platinum. Please see our XenServer Administration Guide for more Information.