In the history of computing, one aspect that has eluded capture is a consistent encoding scheme.
There have been many attempts to standardize a character encoding scheme, but each has had strong downsides until about 20 years ago.
The invention of UTF-8 in 1992 by Ken Thompson based on Unicode was the solution to the original problem of universal encoding. It provided the ability to represent all possible characters in the smallest space possible. It also preserved compatibility with ASCII. Other solutions did not have full coverage or used too much storage.
UTF-8 1981-4: Most efficient and compatible
|ASCII||1964||1||Early standard. Made popular by PC|
|UCS-2||1993||2||First Unicode encoding|
|UTF-16||1996||2,4||UCS-2 model with more chars|
|UTF-32||1996||4||Simple, but uses much memory|
Based on timing, Windows adopted UCS-2 first with Windows NT in 1993.
This was seen as an improvement over the traditional single byte character sets of DOS and Windows 3.x. Unfortunately, it added complexity to Windows development and support due to the dual model of Unicode and non-Unicode support. It led to the classic doubling of Windows APIs. One version of the API appended with ‘W’ (Wide/Unicode) and the other ‘A’ (ASCII). A brief explanation can be found here.
Citrix XenApp and XenDesktop are built on Windows models. Even Citrix Linux VDA has its roots in the Windows encoding schemes. Overall this makes sense because it is the history of what happened and it made sense to share code between platforms.
This philosophy has changed for Linux VDA version 1.1. Instead of trying to preserve the Microsoft encoding schemes, everything is now converted internally to UTF-8. Initially this might not seem to have much relevance to customers. However, this provides some immediate benefits and also some longer term improvements.
Because UTF-8 is now the core encoding in Linux VDA, it no longer has to convert strings internally. This improves performance slightly and also reduces the risk of losing something in translation. It also reduces the footprint of how much space the strings need.
Another benefit is that it allows for a native encoding on Linux. Messages coming from administrators are now allowed to be displayed using full Unicode support. Even though the message arrives in UTF-16 from Studio, the message is converted to UTF-8 and displayed using GTK+.
Another reason to use UTF-8 is that it is now possible to support full Unicode text transfers with the clipboard. Again, even though the clipboard is receiving UTF-16 text, it is automatically translated to UTF-8 for the sake of the Linux applications. This is very important for Asian languages that typically have large character sets.
A side benefit is improved username handling. It is now possible to support usernames that include non-ASCII characters. This was tried as part of the Linux VDA 1.1 tests. The username in that case had characters above the BMP range (>64K) which is considered fairly rare but valid.
Beyond these changes, the logging and tracing components now use UTF-8. This allows for full character set usage for log messages and trace output. There is still more work needed to localize the log messages to non-English languages but at least it is enabled and will display UTF-8 content.
Internally it simplified the code in many places. This will allow for more consistent handling of the strings and less trouble with conversion.
As to the future, it provides a base for Linux VDA to better support any language. It also allows for the possibility of having a server that supports different languages at once with different users. The overall biggest potential is full integration with Linux related to text.
To read more from the Linux Virtual Desktop Team, check out all of our posts here.