At NetScaler, we sowed the seeds of clustering in 2007 when we embarked on our journey to the nCore architecture. The shared-nothing communicating sequential process model of nCore was designed with distributed multi-core multi-node topologies in mind, in anticipation of not only NUMA and NUIOA, but also horizontal scale out clustering.
Countless lines of code changes in numerous files, a few dozen patents and 5 years later, we are now ready to usher ADCs into the realm of Terabit-per-second networking with NetScaler TriScale Clustering… Just like that.
Today, I just want to tell you about what TriScale Clustering has to do with Pizzas and Zebras. We, the NetScaler team, will have much more to say about TriScale clustering over the next several weeks. Over a series of postings we will share with you some detailed rationale behind what we did and why we did it.
TriScale clustering is what happens when an irresistible force – the cloud era, meets an immovable object – the big iron chassis of yore. Let us start with the trouble with chassis.
The Trouble with Chassis
In the olden days, it used to be that whether you scaled using discrete boxes and clever software, or through a hardware chassis, was largely a matter of taste. And sometimes religion.
That all has changed now.
There are two sides to this change.
First, it is getting much harder to live with the compromises that a physical chassis imposes. This is especially true in the cloud era of today, where the increasingly dynamic nature of data centers and operations demands a lot more flexibility, and a lot less waste.
And second, with TriScale in NetScaler 10, there is no longer a need to live with those compromises.
The trouble is that an ADC built around a physical chassis is forced to pick an arbitrary and often shockingly small number for how many blades a chassis might be able to accommodate. In some chassis, the choice of backplane architecture makes this issue particularly acute, limiting the number of blades to just four or eight.
The situation would not be so bad, if there were a way out once you max out one chassis. For example, it might still be workable, in spite of the inevitable performance penalty, if you could continue to add capacity without skipping a beat, by adding more blades to a different chassis, for example.
But such is not the case with chassis based ADC offerings today. The chassis imposes a rigid non-negotiable boundary. When you reach the limit of that 4-blade chassis, it is fork-lift time. You’re now in the market for a shiny new 8-blade chassis, if at all such a thing exists. And what if you run out of that? Well, as Porky Pig would say “Th-th-th-that’s all folks!”
Double or Quit
The chassis however, introduces a new dimension along which things can be painfully incompatible.
Blades tend to be specific to the chassis they are designed for. So don’t expect those blades you already have for the mid-range chassis to just work in the higher-performance chassis when the moment arises. When you run into a chassis limit, you don’t just write-off your investment in the now-too-small chassis; you also write-off all those blades that it was fully populated with.
The simple fact is that you buy your chassis, well, by-the-chassis. And dreamy marketing claims notwithstanding, this is a pay-now, use-later scheme. That empty chassis is some of the most expensive sheet metal. One can buy an entire automobile for less money than what some of these empty boxes cost!
This is not pay-as-you-grow. This is not just-in-time. This is not cloud. This is pay-us-just-in-case. This is simply not the way folks want to build cloud-motivated architectures. This “Chassis tax” that you must pay for those empty slots takes the proverbial win-win and dedupes it to just a single party win. And that party, dear reader, is not you.
So long as it is black
“You can have any color as long as it is black,” Henry Ford is said to have said. Although it is not an inherent drawback of a chassis, a similar thing is going on. Even though ADC chassis vendors often have regular non-chassis offerings as well, those non-chassis products appear to be falling behind the times. It is invariably the case that substantial chunks of functionality are shipped with the “only available on chassis models” disclaimer. This is especially true of new engineering work coming out of these companies.
You can run any feature you want, just so long as you run it on the chassis product.
Now, I don’t think this is an elaborate conspiracy to promote chassis over non-chassis products. But it is a case where the reality of engineering economies works out to the disadvantage of the customer.
For one, distributed algorithms that assume a chassis, allow architects to take shortcuts in implementation since they often overlook a large set of failure scenarios on the arguable assumption that those failures are unlikely because “it is a chassis”. And second, chassis based hardware is often sufficiently different from non-chassis hardware that sooner or later the engineering team is forced to take a fork. “It is easier to implement on the chassis, so let us just support it on our chassis..,” they say.
Once down this slippery slope, before they know it, they are maintaining multiple code lines, and being forced to prioritize chassis over others. Hence the dichotomy.
But there is more to it than that. It is wasteful. It is wasteful to light up a 2,000W power supply if what you need to operate is one 400W blade. Even highly efficient power supplies are not nearly as efficient when operating significantly outside the range of their design loads. Further, it is wasteful to have to occupy seven or more rack units of space when a single-rack unit could have sufficed.
Couple this with the fact that chassis today are not at a robustness level where they can be deployed stand alone and most customers end up deploying active-passive chassis in pairs. And you see really how much waste is inherent in the sparse-chassis + spare-chassis model.
And then there is the issue of internal fragmentation. When you combine the rigid boundary that a chassis imposes, with the fact that not every chassis is fully populated, you get a scenario where not only do you have a bunch of empty slots sprinkled across your chassis, but also there is no easy way to “consolidate” this large number of sparsely populated chassis into a smaller number of more densely populated ones. This is what us software architect-types call internal fragmentation and it has been an issue in not-so-carefully-designed systems for as long as computing has been around.
Need to get physical
Automation is a paramount consideration in today’s cloud era. When it comes to automation of provisioning and deployment, the question I like to ask is – “can I accomplish this from my iPad, languishing on a beach in Maui?”. In other words, is this a logical action or a physical one? Chassis falls woefully short on this requirement. Unless you have a nifty robot racing up and down the cold isles physically inserting and ejecting blades in and out of the chassis, that is.
“Wait a second!” I hear some exclaim nervously. “We have structured cabling. We have just revamped the whole thing to be a “flat network”. In our network, anyone can talk to anyone at tremendous speeds. So what you’re saying doesn’t apply to us. Right? RIGHT?”
Wrong. Unfortunately, while you were busy eradicating physical boundaries from your layer 2 and 3 network, the chassis based ADCs were busy erecting new ones. If you need more capacity in chassis A, but your spare blade happens to be in chassis B, it is time to say good bye to that beach in Maui. That blade needs to be physically ejected from chassis B, and physically inserted into chassis A. Then and only then can chassis A utilize that blade. The chassis does not know how to leverage your modern, cloud ready, low latency, high bandwidth, super flat, structured cabled network. And their vendors don’t seem to offer any nifty robots either. Note to ADC chassis vendors: If you are reading this, make sure the slot-shifter robot has a fast RESTful API.
When one looks at these ADC blades, it is evident that a number of tradeoffs must be made to adhere to the physical form-factor and power and thermal envelops that a blade affords. Compared to their regular-sized brethren, ADC chassis blades often feature middling processors, limited amounts of memory, and numerous compromises on the storage and crypto-acceleration schemes.
As an example, when the SSL certificates went the way of 2,048-bits a couple of years back, Citrix NetScaler promptly resolved the customer dilemma with an appliance that substantially increased the SSL capacity for 2,048-bit certificates. Don’t expect any chassis based system to out-do a 2U top-of-the-line NetScaler appliance, even with all its blades populated any time soon, though. There is just no way to add the necessary crypto offload hardware within a blade’s limiting form factor.
The TriScale Clustering Way
I have been waxing poetic about the drawbacks of chassis-based ADCs. The fact is that when we started, we seriously weighed a chassis based design, but we found it wanting.
We made the choice to go with the TriScale clustering model after much deliberation. It is a model that is considerably more flexible and considerably more robust. It is also one that requires considerably more engineering effort and more importantly, considerable engineering discipline to pull off. With its no-compromises philosophy, this is the kind of thing that only becomes possible when one plans ahead. 5 years ahead as it turned out.
We insisted from the very beginning that nCore and clustering not be a “special case”. It is not one of the “modes”, it is the mode. We shied away from using special hardware to solve problems of software. As a result, TriScale clustering is available on every NetScaler appliance – physical or virtual.
TriScale clustering does not impose artificial boundaries around a cluster. If you have the appliances you can make a cluster of them. Full disclosure: we do have a limit of 32 nodes in a cluster. This allows you to build upto a 1.6Tbps cluster using current hardware; and allows our QA engineers to sleep peacefully, knowing that we actually tested what we claim!
No more do you have to worry about having to plug or unplug wires. Or cancel that vacation in Maui. If you can reach it, you win.
True capacity on demand
When we adopted the logical addition or removal of nodes as opposed to physically inserting or removing them, we unlocked the door to a dynamic cluster. This is true capacity on demand. Want more capacity – add a node. Have too much capacity – remove a node. All while the cluster is in operation. Without losing user sessions.
Failures get handled similarly. Missing nodes means lost capacity – never lost functionality. Nodes that come available, self-heal into the cluster.
This is the part about Pizza. It has been said that the Internet was built a pizza box at a time. Well so is a TriScale Cluster.
We will talk about attributes of a TriScale cluster, and design concepts behind them, in subsequent blogs. But before I end this blog, I better address the reference to a certain Zebra earlier.
TriScale Clustering (and nCore) is heavily based on the concept of striping. Striped like the Zebra. Well actually, it is a term we borrowed from the RAID storage. At a very high level it is the idea of distributing load, data structures and traffic among the nodes in such a way that each node performs a subset of the work and owns a subset of the information, with distributed algorithms tying things seamlessly.
You might spot zebras or leopards in a zoo, but the time is not far when you’d have to go to a museum to spot an ADC chassis. We’ll talk about leopards next time.