This blog is the 7th in a series of blogs by the engineering team responsible for the creation of the app orchestration technology. Today, I will cover the topic of patching workload machines.

As you may remember from a previous blog, all workload machines in a workload catalog are identical. After deploying a workload catalog, you may want to make changes such as upgrading hardware, installing updates to applications or to the Windows OS, or deploying additional applications from the workload catalog.  Alternatively, you may wish to downgrade machines in a workload catalog by, for example, removing surplus memory or uninstalling some applications. App orchestration makes it easy via workload versioning.

While upgrading a workload machine there may be changes to files in use, additional load on the machine, or the update may even require a reboot of the machine.  At the same time the workload machine may be hosting user sessions. To eliminate potential interference or interruption to end users, app orchestration provides a seamless mechanism to manage the workload catalog version update automatically.

First of all, an admin initiates the versioning process by updating the version of the current workload catalog. Click the “Create New Version” button in the UI to start.  The new version keeps all the current settings and machines from the workload catalog but with a different workload machine import OU. This is the initialization stage.

Next, newer version of workload machines are dropped into the new workload machine import OU. App orchestration will scan one of the newer machines to find its capabilities such as XA version, hardware info, and installed applications. Then app orchestration checks whether all advertisements on the previous version of the catalog are still available on the newer version. This is the validation stage.

Before the validation stage gets started, the admin may decide to abort the process by clicking the “Abort update” button on the workload catalog. This will return the catalog to its previous settings, including the previous workload machine import OU.  Once the validation begins, the admin will no longer be able to abort the version update.  (However, you can still reverse the upgrade process; see below.)

If the validation finds that an application that was previously advertised no longer exists on the newer version, the newer version is said to be incompatible with the previous version, the versioning process is aborted.  The UI will allow you to examine the advertisements that were missing, and roll back to the previous version of the workload catalog.  If, however, the newer version is compatible with the previous version, the versioning will proceed to the draining stage.

In this stage, for each newer version workload machine imported, app orchestration randomly picks up an allocated workload machine from the previous version and tries to drain that machine while allocating the new machine as its replacement. If the old machine has no sessions (either alive or disconnected), the machine will be decommissioned immediately; otherwise, the workload machine will be put into a “draining” mode so that no new connection could be made to that machine. Then app orchestration will periodically poll the state of the machine, until all sessions are logged off. At that moment, the machine is ready to be decommissioned.  The draining of a machine may take a few seconds, a couple of days or weeks or even longer, depending on when all sessions are logged off. The good news is that all these are taken care of by app orchestration automatically and no admin’s manual intervention is needed.

When all machines from the previous version of the workload catalog are done draining, the version update process is completed. The workload catalog is successfully promoted to a newer version with all legacy workload machines decommissioned and replaced with newer version workload machines.

As we can see from the above description, on the draining stage, there is typically a time window in which there are workload machines from the newer version and from previous version. These different versions of machine have different hardware and/or software configurations. If the difference is a hardware upgrade or software updates, it might not be a big deal; however, we will need to pay attention to the case where the newer version of machines have additional applications installed and the admin wants to advertise the new apps. If admin advertises the new apps from the workload catalog before the version update process is complete, users may subscribe to them and try to launch the apps.  Such app launch requests may be routed to previous version of the workload machines in the catalog, which will cause the app launch to fail. To avoid this, do not advertise new apps from the catalog until after the version update is completed.

Another point to notice is that workload machines are replaced one by one during versioning. If you have a limited pool of machines, at the minimum only one additional VM is needed for the versioning. You can first install the new image on the extra VM, import the VM, after the validation, it will drain a previous workload machine. After one previous workload machine is drained and decommissioned, you can use this drained VM as the extra VM and repeat the steps until all workload machines are promoted to the newer version of image.  However, limiting your upgrade pool to a single machine in this manner will require a much longer time for the version update to complete.  App orchestration will use as many machines as you have available to perform the version update procedure, up to the total number of machines that are allocated from a catalog.  The more machines you dedicate during the upgrade process, the faster it will complete.

If you are running Citrix Provisioning Services (PVS) for workload machine provisioning, there is a shortcut. If PVS is configured with vDisk type of Standard Image, after a machine is rebooted, it will get a fresh image from the PVS storage. After a workload machine is decommissioned and moved to the Decommissioned Server OU, you do not have to manually install the newer image on the VM; instead, reboot the VM and move it to the new import OU of the catalog to continue the version update process.  As with the manual upgrade procedure, the more machines (virtual or physical) that you can dedicate during the version update process, the faster the version update will complete.

What happens if you start the upgrade process, but users report problems in the field?  In some cases, you might want to revert to a previous version of the workload catalog, even after the version update process is underway.  You can do so manually by creating another new version of the workload catalog, and placing the machines from the older version of the catalog into the new import OU once they have been decommissioned.  In this scenario, sessions from both the initial version of the catalog, and the undesirable newer version of the catalog, will all drain to machines in the newest version of the catalog – which are really the same machines that were in the initial version.  (There is currently no way to short-circuit or accelerate this process but due to the drain feature, the end-user sessions will not be affected in the meantime.)

To wrap it up, this is how patching workload machines works. If you have any questions, feel free to post back comments or ask it at App Orchestration Forum.

This blog is part of a series on app orchestration. For the rest in this series, please refer to the following blogs: