By offering XenApp and XenDesktop both as an on-premises and a Cloud offering, Citrix engineering teams are evolving best practices to improve developer productivity and the sustainability of the product. Customers of both the on-premises and Cloud offerings, as well as the engineers, themselves, are all benefiting from this arrangement.
One of the more technical improvements has been to refine the way new feature development takes place. Though a few subtle changes, we are now able to develop new features more quickly, while maintaining a focus on quality and customer experience.
Feature Development through Toggles
Engineering productivity plays a huge role in the success of the product. The more efficient developers are with their time, the more time they will have to focus on detail and new features. A key opportunity we found was improving the process in which multiple development teams combine their work into a shared codebase. Each development team implements a new feature in a dedicated feature branch. Across all teams, there will be several features in progress at the same point in time. This can be visualized below.
Each feature branch may only be integrated into the main codebase once the feature has completed quality testing. This practice of large and infrequent integrations causes a few problems:
- Large scale merge conflicts each time a feature stream gets integrated to Master.
- Delays from when a feature is complete until the feature is available as it needs to be part of a larger Release.
- Nested features share a dependent code change
As we increased release frequency, the above problems became more expensive and the team realized it was time to improve.
Development teams have begun to gate their new features behind feature toggles – runtime conditionals – in their code. By effectively disabling the new code, teams were able to perform smaller, more frequent integrations to Main. This drastically reduced the number of “large scale” merge conflicts, which resulted in new features being added more quickly and more stably.
Switching on the Feature Toggles
Now that we have a mechanism for disabling new feature code, we needed a strategy for actually enabling the feature toggle once it is ready for release. We came up with the below rollout strategy, with the intention of a gradual rollout. This allows us to monitor our infrastructure and performance and quickly pause the rollout if any anomalies are detected.
Stage 1: Test Team – After the feature declares completion, it goes through a final round of Quality Assurance testing.
Stage 2: Dogfood Validation – Within Citrix, we have a team maintaining an internal XenApp & XenDesktop deployment for engineering use. We enable the feature for this deployment and run a battery of monitoring and performance testing on an environment with real users.
Stage 3: Internal Customers – Many Citrix employees have personal setups they use for activities such as team use, presentations and demos. This pool of users is granted early access for new features.
Stages 4/5/6: External customers – By this point, we have done exhaustive internal testing and determined the feature is ready for external rollout. We gradually rollout the feature to customers, starting with early adopters that opt in and ending with all customers.
What about On Premises Releases?
This is actually a practice we have followed for previous on premises releases. Early access to features can be configured via registry settings or PowerShell configuration. The only difference now is that the feature is enabled in the Cloud offerings prior to being enabled by default in the On Premises offerings.
Tools & Operations
The technique of rolling out a new feature gradually is known as a Canary release. The technique provides additional risk mitigation to complement internal testing. In order to simplify the Canary Release, the Operations team developed a Feature Canary UI.
Each feature toggle has a corresponding entry in this webpage. Ops will work closely with the development team to iterate between increasing the canary percentage and monitoring the infrastructure. Any change in the Feature Canary will update the infrastructure real-time. This allows Operations to quickly disable a bugged feature if the need ever arises. In the rare event that a customer experiences an issue due to a bugged feature, the customer is moved into the opt out category to reduce the likelihood for a second occurrence.
Once a feature achieves 100% rollout, the development team is notified. After a period of time, they will remove the feature toggle from the codebase and enable the feature by default.
Engineering productivity has drastically increased after adopting this new model. The time savings is now spent on maintaining high quality and even adding additional features sooner. The success from this will catalyze similar evolutions with the goal of continually improving and moving forward.