By now many of you have had a chance to kick the tires on the upcoming Power and Capacity Management feature of XenApp that is currently in its Tech Preview phase and you may have seen Sridhar’s blog on the subject. What I’d like to cover here and hopefully get your feedback on is what’s behind this new approach and where we are headed.
First let’s make it clear that the name “Power and Capacity Management” (PCM) does not really do this feature justice. For one thing, we are hearing a lot of partners and customers tell us that one of the main use cases for PCM is maintenance. With PCM you can set servers you need to take down to a low “tier” and sessions will drain as users naturally logoff their applications. Once all sessions are logged off, the server can be placed in maintenance mode and PCM will ignore it until it is placed back into a pool. You can either use this method, which is very safe because it allows for sudden peaks in demand (a server in a low tier will be utilized if usage suddenly spikes) or you can combine this with disabling of logons (an oldie but a goodie) to ensure you hit your maintenance window. The bottom line is you can actually do maintenance on XenApp servers during “normal” hours; or at least not in the middle of the night or on Sunday mornings!
But let’s back up for a second. If you stop to think about it for a moment, you realize that this is not the way XenApp load balancing works. Going back through Presentation Server, MetaFrame, WinFrame and all the other names in our legacy (yeah, I know, don’t remind me), our load balancing has always been akin to the Gas Law – our sessions will expand and sprawl out to fill every server available. If you put 100 servers in a farm and there are 200 sessions running, there will probably be 2 sessions per server. Not a very efficient use of hardware or energy when you consider each server might be able to run 100 or more sessions on its own! Enter PCM which is really a huge departure from the old way of doing load balancing in a XenApp environment. Rather than loading up sessions uniformly across all available servers in the farm we simply load each server in a pool, one at a time, until it reaches what we call “optimal load”. This allows for servers that are not in use during off peak hours to remain powered off and save on energy costs. As demand increases for more computing supply, we bring more servers online and load more sessions. As demand wanes, we drain servers of sessions and turn excess capacity off. Now for this version of PCM we simply ask the administrator for the maximum number of users per server (per pool or silo) that can run without performance issues. We then use that number as the “optimal load” value while loading sessions onto servers in a PCM controlled farm.
That’s how PCM behaves today. Here’s where we’re going. Imagine if we could take the guesswork out of the “optimal load” number. What if rather than asking the administrator to enter that number, the system figured it out on its own? We have a lot of the pieces in place to be able to monitor server and session performance so it’s not a far stretch for some component like EdgeSight to “observe” session performance and determine, based on historical analysis and other heuristics, what the optimal load value is for a given pool or silo and adjust it upward or downward over time to meet seasonal fluctuations. With this approach you would not have to have static numbers for each workload or silo. Each one would be calculated by the system and each would change depending on the overall user demand. This would make for a much more efficient environment since each silo would always have an optimal number of servers assigned – no waste.
What’s more, not only would we be saving power by only having the optimal number of servers on, we could also very dynamically re-task servers on the fly to meet growing demand for one workload over another. Imagine you have 2 groups of users who use 2 different application sets – say Engineering and Finance users. Today you would build a couple of silos and probably have to over-provision both in order to account for the worst case usage scenario. With a dynamic farm approach, the system determines over time how many servers to dedicate to each of the different user groups (or app sets). No guesswork and no waste. Leveraging the pieces that we have in place today (PCM, Server Virtualization, Provisioning Services, EdgeSight, etc.) with some additional glue we could build a very dynamic environment that is very different from the XenApp farms of today. We are calling this concept “Autonomic Farm Management” or “Autonomic App Delivery” (more marketingish). The question is, would you want your farm to behave this way or would you want to manage it in the current manner? Is this elastic behavior preferred over the more static approach or are there any reservations? Tell us your thoughts.
For heads-up on future postings, follow me on Twitter.