Did you know that you can automatically recover XenApp and XenDesktop services through the Citrix SCOM Management Pack for XenApp and XenDesktop? Here’s how.
A key feature monitors all vital Delivery Controller services, such as Citrix Broker Service, Citrix Host Service, Citrix Machine Creation Service and other like services. In the XenApp/XenDesktop Management Pack, each of these has a monitor called Service Running State. This monitor periodically checks the status of the service, whether it is running or not.
The monitor is designed to raise an alert when the service ceases to run for some reason, but only for the services that have their startup type set to automatic. Services that are started manually and disabled services are skipped. You can change this behavior by overriding the default value of the “alert only if service startup type is automatic” parameter; set this parameter to “false.” This way, all services will trigger an alert when in the stopped state.
Normally, when the SCOM administrator receives an alert, some actions are required to handle the situation. It may take some time for the issue to get resolved – in this case until the service is restarted on a specific system. What if the XenApp and XenDesktop administrator wanted this issue to get resolved automatically?
SCOM provides diagnostic and recovery tasks for such cases. When a monitor transitions to a warning or critical state, diagnostic and recovery tasks can automatically run on the monitored object to help investigate and resolve the problem. Diagnostic tasks attempt to discover the root cause or at least provide you with additional information to assist you with the diagnosis. In contrast, recovery tasks try to fix the problem themselves. Diagnostic and recovery tasks can invoke scripts or command-line executables.
Now, let’s go back to our Service Running State monitor. There are two ways to create a recovery task that resolves the ServiceNotRunning state.
The first option is to create a simple command-line recovery task that attempts to start the service when the later gets into the stopped state. Here are the steps how you do it (I chose Citrix Machine Creation Service for this example). First open Health Explorer on the service object. Under Availability group, search for the “Service Running State” monitor. Right-click the “Service Running State” monitor, select “Properties”, click the “Diagnostic and Recovery” tab, and then click “Add” under “Configure recovery tasks” to add a recovery task.
The Create Recovery Task Wizard appears. If you do not have created a new management pack for custom implementations yet, you can do that now. After the new management pack is ready, select “Run Command” for the recovery task type and click “Next”.
Type in the recovery task name and its description. Select “Critical” for the health state for which this recovery will run. Because this monitor only has two states, Healthy and Critical, this is the only applicable option. Fill in the “Recovery target” text box based on your type of service (in our case this was XAXD Service Machine Creation). Check “Run recovery automatically” and “Recalculate monitor state after recovery finishes.” Selection of “Run recovery automatically” forces the recovery task to run as soon as the monitor gets in the Critical state. If this option was not checked, a link to run a recovery task would be shown in the alert message, so you could run it manually. Selection of “Recalculate monitor state after recovery finishes” triggers monitor to recalculate and acquire new state of the service from the system.
In the next wizard page, specify the command line for execution. Let’s use the “net.exe start service” command for starting the service. In the “Full path to file” text box, type “%windir%\system32\net.exe” (without double quotes). In the “Parameters” text box, type “start CitrixMachineCreationService” (without double quotes).
Set the timeout to 120 seconds and click “Create”.
Now you have created a recovery task that will automatically attempt to start the service in case it is not running.
To test the recovery action, log on to a Delivery Controller system and stop Citrix Machine Creation Service. Shortly afterwards SCOM will notice the status of this service is Not Running and will trigger alert from the Service Running State monitor. Immediately after the alert is triggered, our recovery task will start and try to bring the service back to the Running state. When the task has successfully completed, the monitor will recalculate its state and change back to Healthy.
You can check the State Change history for the monitor in the Health Explorer.
Here, you can see that when a monitor went to the Critical state, the Start Citrix Machine Creation Service recovery task ran and started the service. After that the monitor recalculated its state and changed it back to Healthy.
Because this monitor is configured to automatically close the alert, you can find a corresponding XenDesktop Controller Service Stopped alert in the Closed Alerts list.
This was an example on how to create a simple recovery task for stopped services.
But what if you do not find this kind of recovery robust enough? What if a service has a severe problem and does not always start at the first attempt? Would it make sense to try to start the service three times over a three-minute period and if the service still didn’t start successfully then you would raise an alert? In this case you need to create a more advanced recovery task with custom script and more complex logic. I will explain those steps in one of the following blog posts.