During my deep dive into the design benchmarks of Desktop Transformation Accelerator for my last blog post about application delivery, I noticed that redundancy doesn’t seem to be a priority for some customers. In fact, even for large implementations with 2,000+ users, infrastructure components are not always implemented fault tolerant. Therefore I’m going to write several blog posts  dedicated to redundancy in virtual desktop environments. Within this first issue I’ll like to focus on the Desktop Delivery Layer.

Let’s take a look at what Accelerator tells us about the redundancy options chosen by customers.

Chart 1 – Number of desktop delivery controllers (XenDesktop Brokers) per site, by % of projects

In 17% of Accelerator projects, only one desktop delivery controller is implemented.

Why is this bad?

If there is an outage of this system, users will not be able to connect or re-connect to any virtual desktop (existing sessions are not affected). One could argue that there is a workaround to ensure fault tolerance even with a single controller, by activating the HA-mode of the virtual desktop agent as described within the following eDocs article. But this has two major downsides:

  • It works only with dedicated desktops, where the mapping between the user and the Virtual Desktop Agent is known
  • Users need to be provided with a custom ICA file, which is used when the controller is down. This procedure is not very intuitive and managing hundreds of ICA files is a maintenance nightmare.

Therefore, the best practice is to implement a minimum of two controllers per XenDesktop site, regardless of the size of the environment.

The second item related to this matter, which I’d like to discuss, is the redundancy of the Microsoft SQL Database.

Chart 2 – Level of fault tolerance for the SQL DB, by % of projects

In 33% of all Accelerator projects no fault tolerance has been implemented, and, in 15% of projects the fault tolerance is weak (VM-level HA).

What’s the problem here?

In XenDesktop, all information is stored on the database; controllers communicate only with the database and not with each other. A controller may be unplugged or turned off without this affecting other controllers in the site. This means, however, that the database forms a single point of failure. If the database server fails, existing connections to virtual desktops will continue to function until the user either logs off or disconnects from their virtual desktop; new connections cannot be established if the database server is unavailable.

Therefore, our best practice is to implement either SQL Clustering or SQL Mirroring, in order to ensure automatic failover and continuous service. While VM-level HA can be seen as a fault tolerance solution, one of its major downsides is that it only kicks in if the whole SQL server fails (i.e. Blue Screen) but not if “just” the SQL service fails. Furthermore, the SQL service is down until the automatic reboot of the SQL server has been completed. More information about this topic can be found here eDocs – XenDesktop HA Planning.

The next and final topic I’d like to outline here, is the backup of the XenDesktop SQL Database.

Chart 3 – Backup of the SQL DB, by % of projects

In 12% of Accelerator projects, the SQL DB is not backed up.

Why is this an issue, and do I need to backup my DB if I use SQL clustering?

As discussed earlier the SQL DB is a vital piece of every XenDesktop infrastructure, which needs to be protected and handled with care. While clustering or mirroring the SQL DB helps keeping the service up in case of a single server outage, it does not protect from logical errors. So in order to be able to recover in case a hotfix or a SQL script damages the contents of the DB, we need to have a recent backup. Citrix best practice is to perform a full backup every day and keep the backup for up to six months by following the Grandfather-Father-Son Principle.

Another positive side effect of backing up the DB is that this shrinks the transaction logs back to zero and prevents the SQL server from running out of disk space. Further information can be found here: XenDesktop 5 Database Transaction Log Growing Excessively.

For further information about High Availability / Fault Tolerance of the Desktop Delivery Layer, please refer to the following guides:

If you’re about to start a XenDesktop project and you would like to accelerate your decision-making process, create a project in the Desktop Transformation Accelerator and benefit from the input of your peers.