The term “Cloud Federation” is used increasingly in the literature, in our own blog postings and in discussions. It doesn’t take much to realise that different people use it to mean different things. Originally it was a political term, a federation being described as “a type of sovereign state characterised by a union of partially self-governing states or regions united by a central (federal) government” [Wickipedia]. The entry continues: “In a federation, the self-governing status of the component states is typically constitutionally entrenched and may not be altered by a unilateral decision of the central government.” It’s this second statement that holds the crux of the matter, which is the independence of the component states, which allows for these states to differ from each other.  Of course, the world’s best-known political federation is the USA, and because the USA is a federation and has a central government, some people fall straight into the rather obvious syllogistic trap of assuming that the existence of a federation implies the existence of a central government too. Others prefer woffle to substance in their definitions. Given the infuriating modern custom of “selling benefits not features”, it’s actually quite easy to find definitions of the wonders that Cloud Federation will supposedly deliver, but quite hard to find a definition of Cloud Federation itself, or any hint of how to achieve it – which suggests that most people don’t really know what it is. Take a look at these:

  • From searchcloudprovider.techtarget.com: “Cloud federation is the practice of interconnecting the cloud computing environments of two or more service providers for the purpose of load balancing traffic and accommodating spikes in demand.” [No, ‘fraid it isn’t.]
  • From whatis.techtarget.com: “A federated cloud (also called cloud federation) is the deployment and management of multiple external and internal cloud computing services to match business needs.  A federation is the union of several smaller parts that perform a common action.” [Not exactly.]
  • From cloudswitch.com blogs: “The ability to federate this heterogeneous ecosystem—to create a uniform environment spanning external and internal clouds—is going to allow IT organisations to meet user and corporate needs with an agility and economy not previously possible.” [Better, but rather vague.]
  • From Cisco’s Cloud Computing Primer: “One definition of cloud federation as proposed by Reuven Cohen of Enomaly follows: Cloud federation manages consistency and access controls when two or more independent geographically distributed clouds share either authentication, files, computing resources, command and control, or access to storage resources.” [More like it, and more detailed … but still not quite there.]

Different clouds are owned, managed and governed by different authorities, none of which can compel any of the others to adopt a “foreign” way of doing things. Compulsion could be achieved only by a higher authority issuing a diktat, but as we have seen there may be no higher authority, and even if there were, cloud authorities might still ignore it despite promised sanctions. Therefore no federation can be achieved without the agreement of the parties concerned to cooperate in order to overcome heterogeneity. In other words, your cloud A can only federate with another cloud B to the extent permitted by cloud B, and vice versa. And before cooperation is achieved, there must be a period of negotiation in order to establish the nature and extent of that cooperation.

The establishment of a federation between two generic systems A and B will typically mean that resources and functionality hosted by one system is made available to the other. In a symmetrical relationship, the authorities respectively controlling A and B would share equally, and when everything is shared so that A is controllable completely from B and B from A, it’s hard not to think of them as having merged – except for the fact that it only takes a change of policy by one authority to end the arrangement. With federated clouds, however, the relationship is typically asymmetric; the enterprise will generally want to use the facilities provided by a Cloud Service Provider (CSP), rather than the converse. However, asymmetric does not mean one-way: the right information must flow in each direction for the management of a cloud federation to work.

In a cloud federation, the boundary between clouds is still there, but aspects of the boundary that would normally prevent interoperability will have been overcome. Whether or not the boundary is apparent will depend on who you are and what you are trying to accomplish. If you are an end-user trying to access a desktop hosted on a remote cloud, then every effort will have been taken to hide this boundary from you, so that you are in a state of blissful ignorance (at least, with respect to the federation). If you are an administrator trying to balance resource usage across your datacentre’s private cloud and a third-party public cloud (perhaps to minimise cost), then you very much want to be able to see the boundary and what’s happening either side of it. So the important point is not that the federation boundary is hidden, but that it can be hidden when you need it to be.

In order to achieve federation, two issues must be overcome: mutual mistrust, and technical discontinuity. Mutual mistrust can be mitigated by a robust approach to security, both within the cooperating clouds and across them, but this can only raise confidence so far: other measures (such as CSP certification and SLAs) must be employed to finish the job. The last resort will always be litigation (i.e. deferring to a higher authority for issue resolution). In any case, the acceptability question to be answered is not “is this federation secure?” but “is it secure enough, in the right ways, for what I want to do?” Do bear in mind, also, that ensuring security within each cloud is not sufficient to ensure security between clouds – this needs approaching as an issue in its own right, and any inter-cloud and infra-cloud security mechanisms must also interwork where needed. Hic dracones: as Alexander Pope might have said, never was a little learning a more dangerous thing than in the field of distributed security. Mutual mistrust will probably also exist in other areas, creating judicial, contractual, economic, social, political and cultural boundaries, all of which may need attention, discussion, and eventual agreement in ways to overcome them.

Technical discontinuity is a little more tractable, but also probably more complex than most people realise. It’s not a single thing, but rather the aggregation of a whole bunch of technical discontinuities, each covering a different aspect of technology. Each discontinuity will have to be analysed, a means of bridging each gap devised, and appropriate mechanisms put in place – which may be on either or both sides of a federation boundary. You would hope that your CSP will have done most of this work for you, but even so you must still be aware of what they have done, and act accordingly. Let’s examine one of the most common mechanisms for crossing a federation boundary: the Application Programming Interface (API). It’s not enough just to list a set of functions which can be called by a potential user; for the boundary to be crossed effectively and reliably, the number and type of each function’s arguments (the syntax) and any returned value must be known by both user and provider, and just as crucially, the semantics must be identical on both sides (otherwise API calls could be meaningless); similarly, the same understanding of the protocols (the permissible sequences of API calls) must exist on both sides, again in syntax and semantics. All possible error conditions need to be shared, with their meanings and possible consequences. That’s quite a lot of work, just to make sure that when you use an API, you use it the way it was intended to be used. And in any good IaaS fabric, there will be plenty of APIs,  each offering access to a different part or at a different level of the CSP’s empire, from virtual network configuration via virtual machine instantiation to user management.

Unfortunately we’re not finished: there’s yet a further aspect of cloud federation that also needs ironing out for federation to be possible: that of naming. Wherever we have names for things – whether it’s IP addresses or users or files or progam instances – we have to understand how our names for things translate into the CSP’s names for the same things, and vice versa. It is absolutely vital to ensure that any name clashes can be correctly resolved: names must be unique, and resolvable to the things they name, or your system won’t work. A good CSP, though, will try to ensure that you can use your naming systems for things that happen in their datacentres, because that makes life easy for you, the customer; you can then see cloud use as an extension of what you have, rather than a bridge to something alien.

So, in summary, it’s mainly heterogeneity that makes the federation of clouds difficult, and it should now be obvious to any reader that federation is not achieved just by connecting clouds together with a wire. (Readers are therefore strongly advised to be very careful when using the terms “federation” and “federated”, and not to apply them inappropriately to any old systems that just happen to be connected to each other.) But it’s not all bad news – heterogeneity also brings opportunity. It increases flexibility and choice, allowing specialist variations of cloud features and behaviours, a range of performance levels, pricing differentials, competition, cooperation, and the possibility of a rich cloud ecosystem. A homogeneous cloud federation – where everything is built out of one and only one kind of technology – might well work, but because of its inbuilt limitations, there’s no guarantee that it will work as well as a heterogenous cloud federation could in addressing the whole gamut of cloud usage scenarios.

Despite the additional technical complexity to be overcome, Citrix has always been a fan of heterogeneity – right back to the “any, any, any” days of Presentation Server – and our customers ultimately benefit. We know, because they’ve told us. We see no reason not to continue this policy while we get on with the task of attacking the problems of the heterogeneous cloud – which is, in truth, the most complex kind of distributed system there is. Contrast that with the somewhat pusillanimous approach of our main rival, VMware, who seem to prefer homogeneous clouds for their ease of engineering, but only if they are built with their own, rather inflexible products that shun interoperability, to the detriment of everyone except themselves. Not exactly an heroic stance on their part, is it?