This is a guest post from George Spiers.
“Citrix is slow!”
How often have you heard that from your end users? It’s one of the most common complaints I hear in the field.
I’ve spoken in the past about how Virtual Apps and Virtual Desktop deployments are performance sensitive. You see, the performance of such deployments depends on the health of many different components and layers. These include, but are not limited to, Domain Controllers, DNS infrastructure, Delivery Controllers, StoreFront, storage arrays, and the network.
Recently, I joined Goliath Technologies to co-deliver a webinar on the topic How to Troubleshoot “Citrix Is Slow” and Prove It Isnt Citrix. We had a great response, with more than 1,650 registrations and more than 550 tuning in on the day of the webinar. It’s a clear sign that this is a hot topic for all Citrix administrators today.
We focused the content of the webinar on providing three sample scenarios where Citrix is slow. In each case, we discussed two troubleshooting approaches: one where you had access to free tools such as Citrix Director and another where you had access to third-party tools such as Goliath Performance Monitor.
Scenario 1 – Application Outage
A major health system running MEDITECH via Citrix Virtual Apps recently performed an upgrade to MEDITECH. A misconfiguration was performed during the upgrade, resulting in 40,000 clinicians and staff being unable to access the application.
Such a widescale fault in the delivery of MEDITECH would put intense pressure on the IT department and Citrix administrators — they were now dealing with a major incident with serious implications on providing patient care.
Troubleshooting Session Slowness Using Free Tools
Typically, the Citrix administrator will manually try to launch MEDITECH to see if an error message was displayed and if that error message was coming from the application itself or the Citrix environment. Here, you are testing many different Citrix infrastructure components such as the Delivery Controllers, StoreFront servers, Citrix license server, SQL, and more.
If no useful error message were displayed, you would check Citrix Director to see if there were any failed VDAs or if there were failed connections being logged and what the error messages recorded were. Director will also show basics around Delivery Controller service health, license server health, and any Hypervisor alerts.
You would also involve the following teams to check out their parts of the infrastructure:
- The server infrastructure team would perform a health check of the Hypervisor blades. That includes checking if any nodes are in a failed state, if CPU/RAM consumption is saturated, if required virtual machines are powered on, and so on.
- The network team would look for faults in the path between end users and the Virtual Apps servers and the backend MEDITECH infrastructure. Given that there is complete outage, investigation would likely be focused on the data center/core network.
- The application support or vendor would perform tests against the MEDITECH infrastructure, ensuring that all the different components are running and appearing healthy.
In this scenario, when trying to log on and launch MEDITECH manually, the failure point was application enumeration. This issue was detected during the manual attempted launch of MEDITECH by a Citrix administrator, but it could also have been reported by an end user during triaging with the help desk. The application enumeration issue was impacted by permissions, which were later updated accordingly.
- Resolution time: Two hours
- Users impacted: 40,000
Troubleshooting Application Outages Using Paid-For Tools
The resolution in this scenario proactively corrected the issue before it had an impact on users. Goliath Application Availability Monitor (GAAM) was running in the environment, with Goliath Virtual Users running tests against MEDITECH on a predetermined schedule.
The Goliath Virtual User detected the outage immediately, and an alert was generated containing details of the outage, including screenshots from each stage of the test to help determine root cause. This allowed the appropriate personnel to review the screenshots for more information and provided a clearer understanding of the issue. Armed with these screenshots, details, and analytics, the Citrix administrator quickly determined that application enumeration and permission settings were the failure point, and the permissions were updated accordingly.
The difference with this troubleshooting scenario was that the entire Citrix infrastructure and supporting components were being tested automatically by GAAM. Alerts and screenshots were provided as a service to the Citrix administrators, and the issue was resolved before users were impacted.
- Resolution Time: 10 minutes
- Users Impacted: 0
Scenario 2 – Slowness During Logon
These incidents increased user frustration, decreased efficiency, and frustrated the help desk team, who were taking calls from angry users.
Troubleshooting Logon Slowness Using Free Tools
Logon slowness can be difficult to troubleshoot on your own because there could be many reasons logon times are slow. First, you should try to establish patterns and run through a process of elimination:
- How many users are reporting logon slowness?
- Are users all located in a particular office?
- Are users all using a particular application or desktop?
- Are users hitting a particular data center for their application or desktop?
- Are users logging on remotely or are they logging on from an office location?
One of the first tools to use is Director and the logon duration metrics that can be captured per user. Search for an affected user and see what areas of the logon is slow. Citrix Director records processing times for HDX session connection, GPOs, Logon Scripts, Profile Load, and more.
Then, engage with other teams such as networking to have them perform network tests and establish if any latency is occurring between each of the global offices and each data center.
The server infrastructure team will review Hypervisor and virtual machine resource consumption and capacity across both VDAs and the supporting infrastructure servers.
If you have identified a pattern, like only a certain set of desktops are affected, concentrate your troubleshooting efforts on those. Launch the desktop yourself and determine if you see the same logon slowness. And review Event Logs and CPU/RAM utilization and see if there are enough desktops in operation to serve connecting users.
- Resolution time: Five hours
Troubleshooting Logon Slowness Using Paid-For Tools
The resolution in this scenario was much simpler and faster. Goliath Performance Monitor was running in the environment and producing advanced logon-duration reports.
The Citrix administrator used the GPM web-based console to search for an affected user and to review the 33+ stages of the logon process to determine the delay point. The Goliath logon duration reports for each user session include metrics such as:
- GPO processing time at a granular per-GPO level
- Profile load time
- Brokering time
- The time it takes to map client drives, devices, and ports from the client device
- The time it takes for the Delivery Controller’s XML service to resolve the name of a published application or desktop to a VDA address
- The time it takes wfica32.exe on your client machine to establish a connection with a VDI
In this scenario, it was quickly detected that a specific GPO was causing 37 seconds of extra login time due to unnecessary registry data being set by the GPO.
- Resolution time: Five to 10 minutes
Scenario 3 – Slowness During a Workflow
A top 10 US health system with more than 100 locations nationally encountered end users from multiple locations reporting significant performance impact when scanning to electronic health records. This increased clinician stress and frustration and had an impact on patient care due to the inability to process records and documents through the EHR system.
Troubleshooting Session Slowness Using Free Tools
When a specific user workflow is slow, often you need to determine what components make up that workflow so that you can begin troubleshooting them. That’s the the difficult part and makes troubleshooting time-consuming and complicated. Not having the correct monitoring solutions in place adds to the difficulties and will often leave you to your own devices to resolve.
In this scenario, initial investigation with end users discovered that the scanning workflow was encountering significant slowness.
What do you do next? Fire in the dark. You may upgrade the scanner drivers on all devices, which has the potential to introduce further instability. You will open calls with the EHR and scanning vendors for troubleshooting advice and assistance, involve network teams to investigate if any slowness is occurring between the EHR environment paths and the scanner, collect and review diagnostic logs in an attempt to find something useful, and so on.
Ultimately all of these tasks will take many hours, and often many of the tasks end up being insufficient and a waste of time.
- Resolution time: Two weeks (to update drivers)
Troubleshooting Session Slowness Using Paid-For Tools
The resolution in this scenario was much quicker. Goliath Performance Monitor was running in the environment and collecting large amount of metrics about the Citrix HDX session.
The Citrix administrator used the GPM web-based console to search for an affected user and review the session bandwidth reports to identify the slow point. The Goliath session bandwidth reports show metrics such as:
- Bandwidth consumption per ICA virtual channel
- Input line speed
- Output line speed
- ICA latency
In this scenario, it was quickly detected that high ICA latency was being reported across multiple PCs on the day that this scanning slowness issue was reported. The Citrix administrator can easily review historic reports to learn a baseline for ICA latency. In this scenario the focus shifted to upgrading scanner drivers to investigating the network. Further investigation found that a large number of packets were being dropped, causing retransmits of data and slowing network traffic.
- Resolution time: One to two hours
Want to learn more about troubleshooting slowness in your Citrix environment? Watch the recorded webinar. If you have any questions on the topic of slowness in a Citrix environment or want to let us know how this webinar helped you or your colleagues troubleshoot slowness more effectively, please leave a comment below. And check out these additional resources to help you get the most out of your Citrix environment:
- A battle of Synthetic Application Availability Testing: Citrix App Probing vs Goliath Application Availability Monitor
- Advanced Logon Duration Troubleshooting with Goliath Performance Monitor
Working for Novosco, a major managed cloud provider in the UK, George Spiers shares his expert consultancy, architectural, and professional support knowledge of Citrix and Microsoft products mainly in the healthcare sector. George was part of the Citrix Technology Professionals Class of 2018. You can find him at http://www.jgspiers.com/ and on Twitter and LinkedIn.