It’s been a core Citrix theme for years that “it’s all about the apps”… But what are the apps all about? Data. And from that data, information that can drive both the transactional and strategic life of the business.
BigData and not so big data
As commerce – both business-to-consumer and business-to-business – has moved to the Web, its transactions have driven dramatic growth in the structured data that reflects those transactions as well as the financial, personnel and strategic planning information that those transactions drive. Volumes of structured data have consistently grown at rates of 20% or more, and continue to do so.
It’s not only been the transition of business to the Web that has made this clear, but also the increasing use of storage-intensive content such as video. The delivery of this information not only to performance-optimized corporate networks but to distributed wireless public networks and mobile devices has made the need to optimize information delivery even more dramatic.
The same principles that apply to the nature of business information also extend to the “business of life” – as we take advantage of Twitter, Facebook, YouTube, and other social networks and social media not only for our work lives, but also for our personal lives – often without boundaries between the two worlds. The growth in in unstructured data – everything from email and documents to knowledge bases to podcasts and video – is dramatically outstripping that of structured data, with IDC estimating average annual rates of 61%.
With this growth in data, both structured and unstructured, comes the need to manage it and to deliver it efficiently, securely, and reliably. While each class of information is important in its own right, the greatest value comes from managing them and delivering them together as needed. Some technology companies have focused on SQL databases and related data management tools, while others have focused on meaning-based processing and other tools for deriving value from unstructured data. But the greatest area of opportunity will derive from the integration of the two: optimizing, analyzing, accelerating, and securing structured and unstructured data with common tools.
Each domain of data brings with it major challenges to scaling, optimization, availability, and security.
For structured data, the gap is transcendence of limits to scalability. The traditional approach to increased availability and performance is based on scale-up architectures (such as Oracle RAC) that increase the processing capability on a single database storage instance, and that use RAID to protect that single copy of the data. Greater levels of protection (such as disaster recovery) and processing power can be gained by the addition of scale-out architectures, but these are limited both by the requirements to manage replication and the inability to monitor and manage the networks across which they are distributed effectively.
The most commonly used unstructured data management tools (such as Hadoop) have been designed from scratch to leverage scale-out capabilities for massive replication and optimization. What they lack, however, are the capabilities required by businesses to protect enterprise-class high availability, to secure the information, and to provide access control and auditability for compliance requirements. Ask anyone with Hadoop chops what they needed to do to work around a NameNode failure and you’ll see what I am saying.
What’s needed, then, for the structured and unstructured data distributed across the Internet to attain the levels of performance, availability, and security needed as scale explodes?
The delivery infrastructure must be able to support horizontal scale-out as well as scale-up, and to provide reliability and security, using tools that are aware of both the language and formats of structured data and the nature of unstructured data. Such an optimized data delivery infrastructure has a set of capabilities that should seem familiar: it mirrors the features that an application delivery controller (ADC) infrastructure provides to apps, but delivers them in ways that are designed and optimized not for the apps but for the information they use and process.
They key to this however is to ensure “service awareness” – i.e. much like an ADC natively understood application traffic (e.g. SAP, Sharepoint), in front of the data tier we need native interpretation of SQL and database protocols like Microsoft TDS to truly add value. Simply adding a policy or script to superficially provide this capability is not a production solution as it does not scale. To that end, Citrix is introducing NetScaler DataStream technology, the first natively integrated data acceleration technology that extends the scalability, availability, and security benefits that the ADC brings to the web tier to the data tier as well. The new NetScaler DataStream technology addresses this challenge by inspecting data traffic in real time and applying optimization and security policies in data format, protocol and transcation-aware manner.
I am particularly excited about this capability as this will serve as a foundation to ongoing innovation in optimized data delivery for both structured and unstructured information in future releases. For more detailed information about NetScaler DataStream technology, see Craig Elrod’s blog.
The landscape has shifted, and now information is the key asset, and its explosion is IT’s key management challenge. Wherever and however information is stored, wherever and however users need to access it, NetScaler DataStream technology delivers the tools to accelerate, protect, and secure that access.