The examples of how big data is changing the world are abound. From Nate Silver’s infamous election data to the latest discussion of data surrounding “March Madness“, the impact of big data on our lives is undeniable. What has been interesting however is the focus on how technology, especially around cloud The original tapes used for storing Apollo space mission data.computing, has enabled the big data discussion to really take off. If you think big technology has had a profound impact on data, just wait till you see how big data will change the landscape of technology.

Anyone who has done their dues in IT also knows that infrastructure produces a wealth of data. Terabytes of syslog data are generated daily, netflow data generates an ample number of graphs, and real-time SNMP data triggers pagers and mobile phones to buzz at all hours of the night for attention. Unfortunately, most of this data is logged, used for their immediate operational value, and then promptly dismissed. At best, SNMP data is kept over sufficient time to identify basic trends in areas such as bandwidth utilization.

But this is changing. Specifically, there are two aspects of big data in infrastructure that are changing: (1) The smarts of the data, and (2) What we do with the data.

The syslog bucket has largely been just that – a bucket to throw miscellaneous data into. The rationale for one log over another is generally left to the whim of the developer. SNMP and NetFlow have held similar requirements with SNMP having the distinction of having the greatest strategic potential. As we begin to explore what else can be done with the data for analytics and correlation, infrastructure builders such as Citrix NetScaler start asking how we can make the underlying information more useful. For us, one key step was developing AppFlow for HTTP which added application data to IPFIX records and enabled administrators to tie together a wealth of packet and flow level data to specific application transactions. A link that has never before been offered in web environments.

With the data being generated, vendors in the analytics space have begun to ask what else is possible. This has been especially interesting for vendors that focus on cross-domain data collections which lets them correlate data from a variety of devices. For enterprise administrators, being able to cross-connect data from their entire datacenter is a powerful capability and one that has the potential to lead to smarter design decisions as well as increased efficiencies in implementation as well as buying. The bottom line is that of the various technologies we’ve adopted in the datacenter over the last several years, few have had the potential to be so far reaching in how we run our businesses as analytics has had.

As the options unfold for breadth options for processing this mountain of data, a new generation of analytics is set to emerge that looks at depth. Specifically, how can deeper insight into a short list of applications provide a powerful addition to breadth tools? Or put another way, once I know that a specific device or piece of software is part of key flow that needs further analysis, how do I get that analysis? This is where we’ll see the next generation of tools emerge as additions to current capabilities.

Stay tuned. There is a lot more to come.