I get a lot of questions regarding configuration and sizing of XenDesktop and I thought I’d share some of my storage ideas and experiences on this subject, starting with IntelliCache.
For starters you should read the available information in admin-guides and such to get an overview.
I would also suggest you read the following blog post by Christian Ferber: /blogs/2011/06/22/xendesktop-and-local-storage-intellicache/
This whould give you a great start and probably the information needed to get going.

If you want to scale a XenDesktop/XenApp environment one of the biggest challenges in delivering a great user experience is going to be disk performance.
The thing I always have had a beef with is that when using shared pooled images the write-cache is basically just a temporary storage to keep the session alive.
Putting this temporary data on expensive shared storage never made any sense to me.
And to get any performance and scalability out of your XenDesktop setup you had to go for _really_ expensive enterprise SSDs.
Very annoying and perhaps not the easiest of sales.

I am sure many of you have discovered the almost magical performance increase in going from a mechanical drive to a solid state one.
I got blown away the first time a used a laptop equiped with a solid state drive.
This is mainly due to latency beeing far superior in the SSD drives and not so much throughput related.
The super-low latency is a result of not having a mechanical arm to swing around to different places on the platters and enables far more input/output operations per second.
Here is where the real performance increase comes from compared to mechanical drives.

A real world example

Let me give you an example of a production environment I have worked with extensively.
The setup is as follows:

  • 1500 users, mostly task-workers
  • A shared Windows XP disk image.
  • All applications are virtualized and streamed to client (about 60 in total)
  • Not any of the applications would be regarded as I/O intensive.

The resulting figures on the shared storage is:

  • ~5000 IOPS on average
  • Throughput: 25MB/s

Quite telling, right?

Not many megabytes gets pushed around but it still manages to generate thousands of I/O operations.
Now consider this in regards to a 15000RPM enterprise SAS disk.
It gives you maybe 200-250 IOPS, at best (no random writes, no fragmentation, the stars are aligned, etc.).

Compare that to the figures of a modern SSD with IOPS approaching 100.000 (yes, thousands) and ~50.000 random write IOPS.
You will need a lot of mechanical drives to make up for one SSD, in terms of IOPS, thats for sure.

Consumer drives you say?

I like to think of it this way: an enterprise SSD has x10 the lifespan at x10 the cost. Are you really sure you even want the longer life of the drive?
Within this x10 lifespan, there has probably surfaced an even cheaper and far superior drive anyway.
And we’re still only storing temporary data!

Now, if you read Mr. Ferbers previos blog he even goes so far as pitching the idea of RAID-0.
This further proves my point that we simply do not have to be that careful with the write cache data.
Or if you must, why not find a way in-between, something like RAID-1+0 using consumer drives?
That way tou can afford to loose a drive and still have full performance on the node.
It’s something to think about alright!

Resilience and scaling

The use of IntelliCache is also a small step towards almost a “shared none” concept (at least it brings some of its benefits).
What that refers to is basically small units of storage that form a greater whole.
Each unit has a given performance and size.
This makes for simple horizontal scaling of both the storage capacity and performance; simply add a node and you get more storage space and IOPS.

A SAN on the other hand is, in my opinion, not really well suited for this kind of horizontal scaling.
If you suddenly have to add, say 10.000 users, to a service that is storage dependent (XenDesktop for example), this storage better be cheap and easy to scale.
There is also the question of resilience; arguments could be made that the SAN can become a single point of failure in a service delivery.

Using IntelliCache today you will need a shared storage of some sort to hold the golden image(s) and to take some of the load if an IntelliCache repository fails.
Still this allows you to heavily reduce the need of both shared storage space and performance.
Why not use several smaller NFS nodes, say one for each XenServer pool? Basically build “units” or “blocks” of hardware, hypervisor and storage.