Citrix Presentation Server 4.5 introduced a new feature called Health Assistant also called Health Monitoring and Recovery (HMR). This feature maintains the health of critical services in a Presentation Server farm. Every Presentation Server has a service called a Health Check Agent (HCA) that periodically runs the tests to ensure the health of that server. If a test fails, the system can take a pre-configured corrective action along with writing to the event log. Each test can have its own configuration. i.e. how often it will run, how long will the test wait before it times out, how many times will the test run before it returns failure and what action will the test take on failure. With every test, the admin can configure one of these 5 actions to execute on failure
1. Alert only (shown in the AMC)
2. Remove server from load balancing – This is to ensure new users don end up on a bad machine. Existing sessions will still be running and reconnects to disconnected sessions or direct server connections will still be allowed
3. Shutdown IMA service
4. Restart IMA service
5. Reboot server
PS 4.5 has the following 4 health packs/tests
1. Logon/Logoff test This test monitors the logon/logoff cycle to determine if there is a problem with session initialization. When the time between logon and logoff is below a given value, it normally indicates a problem. Too many of these short cycles within a given time period indicates a problem with the health of the server
2. Terminal Services test This test ensures that sessions can be enumerated, session information can be retrieved etc to gauge the health of terminal services (similar to what utility does)
3. IMA test This test enumerates applications to check the health of the IMA service
4. XML Ticket Request test This test checks the health of the XML service by ensuring that it is able to process XML ticket requests
With the PS 4.5 Feature Pack 1, we released 6 new health packs/tests. If you are just looking for these 6 new health packs, you can also download them from the following location (and install them on PS 4.5 Enterprise or Platinum servers) – http://support.citrix.com/article/CTX112805
5.Microsoft Print Spooler test -test ensures Microsoft print spooler reliability. It enumerates printers on the local server, enumerates printer drivers and print processors. Exercising these tasks is fundamental to gauge the health of the print service
6. Citrix Print Manager Service test This test verifies the health of the service by enumerating local session printers etc
7. Check DNS test – The Check DNS test by default will run a forward DNS lookup and a reverse DNS lookup to ensure that there are no DNS related errors that can degrade the health of the server.
8. ICA Listener test – The responsibility of this test is to ensure that ICA clients can make a successful connection to the local server via the ICA protocol. This functionality is validated by pinging the ICA listener and monitoring the response.
9. Check XML Threads – This test monitors to see if the XML service is getting overloaded with traffic. When this happens, Web Interface/PN Agent connections will suffer. This test will alert administrators that they may need to address XML server performance
10. Check Local Host Cache test This test is responsible for recognizing and responding to LHC corruptions and inconsistencies on the local machine that might have resulted from stale data left when removing a server and/or published application. LHC inconsistencies refer to duplicate entries or entries that do not match with the data store objects.
There is a HMR SDK that can be used to write new tests. Check out this link for more information – http://support.citrix.com/article/CTX112283
All the tests have been authored in such a way that they have little to no impact on system performance and scalability. But the number and frequency of tests determine any possible impact to the system performance and we believe if you keep the tests down to 6 and run them with a frequency of 5 minutes or higher, you will have near zero performance degradation. As you already know, it might be better to first test this in your QA environment.
We are looking at adding more actions in the future (like ability to refresh/recreate LHC, ability to restart MS print spooler service/Citrix print service, ability to register with DNS server and restart ICA Winstation) that can be tied to some of these health packs. Also, we will have better integration with our EdgeSight technology. Please leave some comments if you have any suggestions for further improvements in this area.