Reboot Schedule Internals

Support for Reboot Schedules in the current generation of XenApp & XenDesktop products was introduced back in version 7.0, however these schedules work differently from those in XenApp 6.5. In this post, I hope to add some technical detail to how these schedules operate in the 7.x products and perhaps dispel a few urban myths relating to them.

You do not need this information to use reboot schedules, and it’s most definitely not a User Guide, but if you want to understand why they behave as they do, this article might help.

Reboot schedules in 7.x support many configurations all requiring differences in behaviour, but here we’ll cover a common case that roughly corresponds to a reboot schedule in XenApp 6.5: a scheduled reboot of a delivery group containing power-managed RDS (Server OS) VDAs.

It’s important to distinguish a reboot schedule (the delivery group to reboot, when, and over how long) from a reboot cycle (a single occurrence of that schedule – for example, the one that started at 2am last Wednesday). This terminology is used in the SDK, as in Get-BrokerRebootSchedule[V2] (the V2 variant was added in 7.12 to support multiple schedules for a delivery group) and Get-BrokerRebootCycle.

In this example, we’ll consider a scheduled reboot of a delivery group containing 60 power-managed RDS VDAs where the scheduled duration is 2 hours. Ten of these VDAs are off at the start of the cycle.

When a reboot cycle starts for this schedule the interval between VDA reboots is calculated using the scheduled duration and the total number of machines in the group, or the total number having the correct tag if the schedule is restricted by tag (in 7.12 or later), in our case this is 120 minutes / 60 = 2 minutes.

The VDAs to be rebooted are equally divided between two phases, with the most preferred VDAs in phase 1 and the rest in phase 2. Factors that make a VDA preferred include having fewer user sessions, currently being registered, and not being in maintenance mode. VDAs that are powered-off are not included and are immediately counted as having been skipped. In our example this leaves 50 VDAs with 25 in each phase.

We now start to reboot VDAs in phase 1. Nothing happens to VDAs in phase 2 yet.

Cycle Phase 1

Firstly, all VDAs in the phase are put into drain mode; this prevents new sessions being launched but allows existing sessions to be reconnected. The overall status of a VDA in a reboot cycle and in particular whether it’s in drain mode can be seen in the ScheduledReboot property of the Get-BrokerMachine cmdlet.

Now, in our example, every 2 minutes for the next 50 minutes, one VDA in the phase is picked for reboot. Each selection uses the same criteria used to divide the VDAs into phases, but the preferred order of VDAs within the phase is reevaluated on every selection so that the ‘most preferred’ VDA at that time is always chosen. This results in a major difference from XenApp 6.5 where the VDA reboot order is fully determined ahead of time.

As each VDA is picked, the following occurs:

If, and only if, there are user sessions present and the schedule includes a warning message, the message is sent to the VDA (possibly repeatedly) until either the warning duration is reached or all sessions have logged off.
A shutdown request is sent to the VDA’s hypervisor.
When the VDA has powered off it is taken out of drain mode and a power-on request sent to its hypervisor.

This work typically overlaps and runs concurrently for multiple VDAs within the phase. Once all VDAs in the phase have been picked (after 50 minutes in our example) the phase is complete. Many, but typically not all VDAs have completed their reboots and are available for use again at this point.

Cycle Checkpoint

Once phase 1 is complete, a check is made to see if at least one VDA has successfully re-registered and is available to accept new sessions. If not, the cycle pauses until at least one is available again. If none are available after a timeout period the cycle is abandoned (VDAs in phase 2 aren’t rebooted).

This check avoids the possibility of rendering all VDAs in a delivery group unusable by, for example, rolling-out a faulty image using MCS. By default we’ll wait a maximum of 30 minutes plus the configured warning duration for a VDA to become available.

Cycle Phase 2

Once a VDA from phase 1 is available again, phase 2 starts. All VDAs in phase 2 are placed in drain mode and each is processed in turn as described for phase 1 above.

Thus in our example, a VDA is picked every 2 minutes until after 50 minutes all VDAs in phase 2 have been selected, and many would typically have rebooted and be available again. At this point phase 2 is complete.

We now wait to allow outstanding operations to complete so that we have accurate counts of successfully rebooted VDAs, and of reboots that failed. These counts are shown in the output of Get-BrokerRebootCycle. Again, by default we’ll wait a maximum of 30 minutes plus the configured warning duration for operations to complete.

At this point the cycle is complete.

Pictorial Example

The picture below shows a smaller reboot cycle example with only ten VDAs. VDAs 1 and 2 are rebooted first because they have no sessions present (no message is sent), and both are available again when the checkpoint is reached so phase 2 starts immediately. In this case, because the warning duration is longer than the interval between reboots, both VDAs 9 and 10 do not start to reboot until after the nominal end time of the cycle.

Variations on a Theme

While our example covers power-managed RDS VDAs, physical RDS VDAs can also be rebooted in a reboot cycle (the VDA restarts Windows at the request of the Controller). However, physical VDI (Desktop OS) VDAs cannot be rebooted.

Reboot cycles for shared VDI VDAs are generally similar to our RDS example, but here the VDAs are not powered-on again once shutdown. It is left to power management policies of the delivery group to restart the VDAs as required.

Any Questions?

What happens to VDAs in Maintenance Mode?

VDAs in maintenance mode are always the least preferred ones when picking one to reboot and if they remain in maintenance mode are left to the end of their phase. If at that time there remain VDAs in maintenance mode they are skipped over and not rebooted. So VDAs in maintenance mode for a short period during a cycle are likely to be rebooted, but if they remain in maintenance mode throughout then they are not.

Is Drain Mode the same as Maintenance Mode?

No. A VDA can be placed in maintenance mode by the administrator to prevent automatic power management (including that of reboot cycles). Drain mode is different and is exclusively controlled by a reboot cycle. It cannot be set or cleared by the administrator.

Reboot cycle drain mode is also distinct from Terminal Services Drain Mode supported by Windows RDS machines (which is not used).

Why are VDAs with least sessions rebooted first?

The algorithm is designed to minimise disruption to end users. By rebooting unused or lightly loaded VDAs early it tries to ensure that later in the cycle, as users are forced off more heavily used VDAs, as many newly restarted VDAs as possible are available to accept new sessions.

Why are VDAs that are off ignored?

The objective of a reboot schedule is to periodically restart the VDAs’ Operating Systems. This resets their memory to a clean state, possibly resets disk images, potentially aborts hung processes and removes unused sessions. However, if a VDA is already off then no action is required as it will be in a clean state when it next starts.

Powered-off VDAs are counted though when initially calculating the interval between reboots. This means that the reboot rate is consistent even if a large number of VDAs happen to be off, with the cycle potentially finishing earlier than its scheduled duration.

Why are reboot cycles sometimes shorter or longer than the scheduled duration?

A cycle may be shorter than scheduled because some VDAs are off, but a warning duration, the checkpoint at the end of phase 1, or the wait for outstanding operations to complete at the end of phase 2 may extend a cycle.

Topics

Products