Peter Phaal (yet another S. African contributor to virtualization) at InMon sent me a DM followup to my recent post on the strategic importance of the Open vSwitch project, and its inclusion in the forthcoming XenServer “Cowley” release. He makes a great point about the project’s value in providing granular metrics on performance, from the host VIF, through the vSwitch, the physical NIC, and back through the virtualized network itself. Granular performance measures are vital for both SLA tracking and for making decisions about optimal placement of tenants and their traffic in the data center infrastructure as a whole. Of course, visibility is intimately linked with another need on the part of cloud providers – a need to deliver guarantees of isolation that are required for regulatory compliance. The developing Cloud Audit RFC from Hoff, George Reese of Enstratus, Sam Johnston at Google and Ben Sapiro of Telus, should be considered required reading.
Rather than wax lyrical myself, here’s Peter’s note to me, reproduced with his permission and my thanks:
I read your recent blog post where you mentioned that the Open vSwitch will ship as part of the next XenServer release (Cowley). This is great news!
Your articles have understandably focused on the OpenFlow and the capabilities that OpenFlow/Open vSwitch provides for network virtualization. This is very timely, I have recently been seeing a lot of interest in OpenFlow from customers and hardware switch vendors who want to use OpenFlow to bring the network under the control of scale out applications (memcached, Hadoop etc).
What the Open vSwitch also provides is continuous visibility into network traffic. I read your paper, “Virtual Switching in an Era of Advanced Edges”, and I thought the performance graphs were very interesting, particularly the improved efficiency when switching local inter-VM traffic. The sFlow instrumentation built into the Open vSwitch provides the global visibility needed to optimize workload placement to exploit these efficiencies.
However, to get a complete picture of XenServer performance, you need more than just the instrumentation in the Open vSwitch. The Host sFlow agent exports hypervisor performance statistics (both the physical server statistics, CPU, memory, I/O) as well as per-VM performance statistics (similar to libxenstat). Since switches, routers and load balancers (Open vSwitch, Vytatta, NetScaler) now exist as software entities, you cannot manage network performance without understanding the performance of the host since an apparent network performance problem could now be due to a lack of computational resources on the server. The sFlow standard provides the comprehensive, scalable monitoring solution needed to manage performance in converged environments.
In addition to providing physical and virtual server statistics, the Host sFlow agent can automatically configure sFlow in the Open vSwitch, greatly simplifying the task of coordinating performance monitoring across the data center.
The Host sFlow agent has been built and tested on XenServer 5.6. The agent is tiny (around 50K) and imposes a negligible load on the hypervisor (it spends most of its time asleep, waking up occasionally to grabs some counters, sends a UDP datagram and go back to sleep). The most convenient way of delivering the Host sFlow agent would be if it were installed by default on XenServer. Is this something you would consider for the upcoming XenServer release?
The Host sFlow agent unlocks the instrumentation that already exists in XenServer and delivers the measurements in a scalable and efficient way using a standard protocol. I think customers will find this a compelling solution, particularly when compared to the limited scalability, proprietary tool stacks offered by competing hypervisors.
For those interested in more detail, the following articles give some background:
Cloud Performance Monitoring
Role of DNS Service Discovery in configuring sFlow