In Part 1 of this NetScaler VPX on HyperFlex blog series, I set the stage for the project, and documented the experiences up through the measurement of what I’m calling our “Baseline Workload”. In case you missed that article, here are the “Cliffs Notes”:

  • Our test rig is a 4-node Cisco HyperFlex cluster of HX240 servers. It’s very powerful, flexible, and blessedly simple (a testament to HCI’s ability to deliver on the promise of breaking down silos inside of IT departments).
  • We settled in on a baseline workload configuration of Server 2012/R2 with Office 2016, provisioned via MCS, which is easily reproducible since we’ve got Eric Haavarstein’s automation framework inside the Silverton lab.

We ruminated on the best configuration for the number of XenApp VM’s to run and the configuration to run them with, so we put our LoginVSI skills to the test and put some data behind our decision. The winning config: 20 XenApp server VM’s, with 4 vCPU’s and 20GB of RAM. Here’s how they look from inside of Citrix Studio:

rickd1

  • We ran our baseline workload through a bunch of LoginVSI passes, and settled on some results we could reproduce at will. The highlights:
    • VSImax v4: 523 sessions, after correcting for 2 stuck sessions.
    • VSI baseline average response time: 638ms – one of the best scores I’ve ever seen. Anywhere.
    • VSImax average response time: 1380ms, and we didn’t reach the VSIMax threshold of 1639.
  • Storage latency stayed at or below 5ms during the peak workload.
  • We recorded a vSphere cluster CPU utilization of @78% and cluster memory utilization at @67%, measured from vCenter, at peak workload.
  • We determined that CPU was our bottleneck, on this test rig with this workload.
  • We highlighted some of the key differences between our tests and Cisco’s latest work to help avoid any confusion between the results.

In this post, we’ll talk through how we managed to test sessions running through NetScaler.
(Spoiler alert – the Citrix community came to the rescue!)

We’ll also talk you through our first round of tests running our baseline workload through a NetScaler VPX running off-cluster, and analyze the impact on performance/scalability of the HyperFlex cluster. Finally, we’ll share some of the first bits of utilization data we’ve gathered from that VPX under load. Let’s get at it, shall we?

1- The Problem We Didn’t Know We Had

Our plan started out pretty simply, and looked something like this:

  1. Create a reproducible baseline workload that would stress our Cisco HyperFlex cluster of 4 nodes (with no NetScaler).
  2. Measure this baseline workload, identifying where our bottlenecks lie.
  3. Re-route sessions through NetScaler appliances (running both on and off the HyperFlex cluster).
  4. Measure the impact on the baseline workload.
  5. Share what we learn.

Steps 1 and 2 were relatively easy, though if you read Part 1 of this series, you’ll probably get a chuckle out of the gyrations we went through to be reasonably scientific with this process! Step 3, on the other hand…

Going in, we knew we’d have some work ahead of us because LoginVSI’s built-in launchers can’t handle launching sessions through NetScaler. I’m not quite sure how they’ve gotten this far without this capability, but we assumed step 3 was also going to be pretty straight forward since we thought it had been done before: Citrix Consulting and the Solutions Lab had posted a powershell script that looked like it did everything we needed…

Well, we shouldn’t have assumed! Once we got into it, we realized that the script (published in 2013, and not updated since) wouldn’t work with modern browser, StoreFront, and NetScaler versions. It’s possible that we could have hacked and reverse engineered our way through it, but the method they were using back then seems pretty ridiculous today. There HAD to be another solution, or we were SOL!

That’s when we turned to the Citrix community, specifically the members of our rock star CTP program, for help. After a lively intellectual debate about the best way to handle our problem, Andrew Morgan (@andyjmorgan) and the crew over at ThinScale came back with a solution. They’ve painstakingly written a powerful and handy multi-protocol display ‘connector’ library for their ThinKiosk product, which, with a little tweaking, they thought could be adapted to do what I needed. After a bit of discussion about the specifics, they disappeared into the proverbial code cave to work some magic. A few days later I received a wonderful care package: a module I could snap right into LoginVSI to get the job done! Believe it or not, the darned thing worked perfectly for me the first time out – no muss, fuss, trial, or error. Special thanks to Dave Coombes (@dave_coombes), Remko Weijnen (@remkoweijnen), and Andrew for making this project happen!

2 – Our NetScaler Gateway Friendly LoginVSI Setup

Now armed with shiny new, NetScaler-friendly launcher, we were able to get back to business! All we needed to do to ‘flip’ our load over to run through a NetScaler of our choice was to update the connection to call NSGWLauncher.exe and pass it the variables needed to get the job done:

rickd2

One quick aside – if I may (there’s a slight chance I’m a bit ADD, but I can neither confirm nor deny the allegations!) By this point I’d made some additional tweaks to Eric Haavarstein’s PoSH script for automating test runs. We’d already created multiple different test profiles, which we call from his script:

rickd3

The “-GW” profiles, as you might imagine, are duplicates of the ‘standard’ profiles with the exception of the modified connection settings. When it’s time to run a test, I simply update the parameters in the script and then run it:

rickd4

When I come back in a couple hours, the pass is complete and the environment is pretty much reset and ready to run another test scenario. Pretty cool, eh?

3 – Our “Off Cluster” NetScaler Gateway Test

Now that our LoginVSI test setup has been tweaked and tested, it’s time to get on with the next test. For this test, we’re looking to do a few things:

  • Measure the performance/scale impact on our HyperFlex cluster when sessions are run through a NetScaler VPX that’s not running on the cluster.
  • Start to explore how to measure the load on a NetScaler appliance.

For this test, we kept everything identical, with the exception of the modified connection parameters in the VSI test. In this pass, we’re actually leveraging the same pimped and primed VPX instance (running on an SDX appliance) that Dave Brett (@dbretty) configured and shared in his epic “Citrix NetScaler, Federated Authentication, and Google” blog series. We simply added another NetScaler Gateway Virtual Server, and placed it on the same subnet as the launchers to eliminate the impact any routing delays might add:

rickd5

With the VPX up and ready and the launch script tweaked, we’re off to the races!

4 – Our Results

Now let’s go ahead and tear into the results and see what we learned, shall we? For reference, I kicked off the test at 11:37, sessions started logging in at 11:39, and logging off @ 1 hour later. This will come in handy when we look at some of the charts later on.

Let’s start with the Summary chart straight out of the LoginVSI Analyzer tool:

rickd6

First off, let’s call out the VSI baseline score of 647. As I mentioned in part 1, this “Very good” score is well within the ‘normal’ range we’ve seen in a TON of test runs completed. For reference – the best we’ve seen is 638, the worst is 651. Now let’s move on to the VSImax. As we saw in the ‘normal’ test run (ie. not through a NetScaler) we didn’t reach VSImax. We also had no stuck sessions, so there’s no adjustments needed to the total number – they all executed successfully.

Now let’s move on to the VSIMax Overview chart:

rickd7

Here we’ll focus our attention on the blue line, which is the VSI Index Average. Notice it gives a nice, smooth progression as we approach the top of the load – that’s good stuff. We also didn’t cross the VSI Threshold, so we don’t technically have a VSIMax number. One interesting thing to note is the VSImax average, at 1373 – that’s actually a bit better than we saw without Netscaler in the mix!

Let’s compare these results with the results from the ‘no NetScaler’ run and see how we faired (best results are in bold):

rickd_table 1

Dang! That’s about as identical as you can get! At this point, it’s looking like the HyperFlex cluster doesn’t care one way or another if sessions are running through a NetScaler – though keep in mind that the NetScaler isn’t running on the HyperFlex cluster at this point. Let’s keep moving along and see what else we can gather.

Let’s look at the storage cluster performance up to the peak workload (525 users at 12:30):

rickd8

You’ll notice I waited a bit for this screen shot – there was one high blip in latency at the beginning which was masking the details during the test by blowing out the Y axis values. As we’d seen before – the HX Data Platform storage performed admirably, staying well under an average of 5ms read/write latency throughout the test. I still don’t have a clean way to compare the cluster performance charts (though I do have ESXTOP data I can analyze later) but from eyeballing it, I’d suspect they’re comparable, if not a bit lower…? Maan – I may have to munge through the ESXTOP data yet!

Finally, let’s look at the cluster wide resource consumption at peak workload:

rickd9

Applying the same back of the napkin math, I get 79.55% CPU utilization, and about 67.56% memory utilization. I take these with a smidgeon of a grain of salt as I was a bit late with the screen grab – some sessions had already begun to log off, but let’s look at these side-by-side with our ‘non-gateway’ results:

rickd_table2

Yeah – you read that correctly! CPU utilization was within 0.22% and memory utilization was within 0.06%. By this point, I think it’s safe to say that HyperFlex doesn’t care if the sessions run through an external NetScaler. Good to know!

5 – First Lessons Learned Measuring NetScaler Utilization

Before I get too far into this, let me remind you that I’m a relative NetScaler NOOB. This is the first time I’ve really dug into the metrics, so like any well trained noob would do, I started by looking at the built-in reports. I’ve picked out three to look at that I think will be useful as we wade our way into this exercise. As we parse through these, remember that our workload started coming on at 11:39, and logoffs started about 50 minutes or so later.

I’ll start with CPU vs. Memory Usage and HTTP Request Rate:

rickd010

I don’t quite know how to interpret the Total HTTP requests (rate) value yet, but I can clearly see that it started going up at the beginning, flat-lined at some point pretty early on, and dropped off as sessions started dropping off. I’m guessing that’s not too relevant to our discovery here, so let’s move on.

The yellow line shows us the % of In Use Memory. It may have ticked up just a little during the test, but not much – it never broke 20%.

The orange line shows us Packet CPU usage %. There we can see a correlated increase as sessions log on and start pumping packets through, but again it’s negligible.

My first read? This sucker isn’t even close to breaking a sweat! Do keep in mind that since this is a NetScaler VPX instance running on an SDX appliance, I was able to provision it with a couple SSL offload chips to help with the SSL processing. It’ll be interesting to see how this compares to running a VPX on a ‘normal’ hypervisor/server platform without SSL offload chips – I expect we’ll see more CPU usage as the offload is handled by the regular vCPU’s…

Next, let’s look at the Current ICAOnly sessions report:

rickd011

This chart gives us a clear visual of the number of sessions being pumped through the NetScaler. We can see it rising linearly as LoginVSI keeps loading it up with sessions, and the high point on the chart is at 525, which correlates with the number of users we were running through. Cool!

Now let’s take a look at the amount of data pumping through the NetScaler during the test by examining the Megabits received vs. transmitted report:

rickd012

This one’s kind of interesting! Again, we can see a clear correlation between the time the sessions start logging in and ramping up, then tailing off as they start logging off. I’m a bit surprised to see the transmitted and received rates correlate so closely… I’m not sure this is the right measure to use yet, but I’ve got a feeling this’ll help us determine what size of VPX someone needs to purchase for this type of workload… we’ve got some fun discovery work ahead of us!

6 – Next Steps

As you might expect, it’s recap time! Building upon the Baseline Workload work we did in Part 1  of this series, we explored a problem we didn’t know we had, and got a taste of the power of the Citrix community who came through and solved it for us. We walked through how we implemented our new NetScaler friendly launcher, and peeked under the covers into how we’re executing these tests. We then explored the results from a test run through the gateway, leveraging the exact same baseline workload we’d previously determined was the best for our HyperFlex test rig. We learned that HyperFlex doesn’t care if the sessions run through a NetScaler – at least one running off the cluster – as the results we got from our comparative test runs were pretty much identical.

What’s next? Here’s the plan for the next couple posts in this series, which I think may go well beyond the 4 parts originally envisioned:

Part 3: Testing the Baseline Workload through NetScaler VPX (on HyperFlex)

In the third post of this series, we’ll go over the basic configuration of the NetScaler VPX we’ll be running alongside our hosted shared desktop workload. We’ll tweak our LoginVSI launch parameters to use it, and finally we’ll execute and analyze some tests to see how performance and scale compare when you run your NetScaler VPX’s alongside your desktop workload.

Part 4: Post-game Analysis and Lessons Learned

In the final post of this series (for now! ;-)) I’ll run back through what we learned from this exercise. I may even bring ESXTOP numbers into the mix to see if we can find anything interesting! Finally, I’ll lay out our plan for answering the remaining questions on our list.

That’s all for now, but you can expect some more sharing out of the Silverton lab in the near future. I hope you enjoyed this installment, and that this project will somehow make a difference in your world!

Respectfully,

/@RickD4Real

syn17-d-banner-blogfooter-729x90