Quality is a critical property of a software product. Quality doesn’t spontaneously happen but must be actively engineered in.

XenServer Engineering does this through our “Quality Engineering” process which goes something like this:

  • Define and agree quality goals for a XenServer release
  • Define and resource a plan to meet the goals (the “quality plan”)
  • Tracking and report progress of the quality plan, taking corrective action where necessary

And because quality is too important to leave to the quality fairies we appoint a “Quality Manager” to be accountable for all this.

The Quality Manager ensures quality has an equal place at the table whenever release decisions are being taken, stopping us from falling into the all-too-common trap of silently trading out quality in favour of scope or time. I have previously written about this in more detail here. In this blog we take a look inside the quality plan for Dundee, a XenServer release currently under development.

Dundee Quality Goals

First–and most importantly–we have a set of quality goals relating directly to Dundee release quality. The headline target is couched in terms of Customer Raised Unique Defects (CRUDs) at 180 days after release. As shown in Figure 1 we have been tracking this metric for a number of releases over a number of years.

The breakthrough in quality by this measure came with the XenServer 6.2 release where an increased investment in quality payed off with a very significant reduction in the number of CRUDs. XenServer 6.5 maintained this trend of improvement. For Dundee our target is to at least match the quality of XenServer 6.5 as measured by CRUDs. This is a more aggressive target than it sounds given the exciting and complex features being developed for XenServer Dundee.

Figure 1 – XenServer CRUDs since 2010CRUDs

As CRUDs are a lagging indicator (we don’t know whether we have met the goal until 180 days after release) we supplement the CRUD goal by “in-project” targets which we believe to be positively correlated with CRUDs. The main one we use is a function of in-project defect levels. We have rich historical data on how the number of unresolved defects at various stages of XenServer projects (notably time of release), map to CRUDs.

Further, we have a methodology for assessing defects and focussing our effort on the ones most likely to result in CRUDs, or in customer dissatisfaction generally. This also ensures we minimize effort spent fixing defects that are unlikely to impact our customers. We also break our overall CRUD target down into sub-goals targeting specific product areas, features or capabilities in order to focus attention where it is most needed and where we can get most bang per buck.

We identify these areas by analysing sources such as CRUDs from previous releases and intel from our tech support teams. In Dundee we have a general target to not regress any of our performance and scale KPIs as well as some specific improvement targets.

We also have a set of “indirect” quality goals. These are goals intended to drive engineering efficiency and effectiveness, to optimize use of our project budget and thereby ensure we can devote a greater share of our energy to building great features with great quality. In other words these goals have the side-effect of making our releases more predictable and our “direct” quality goals easier to reach.

For Dundee, we have goals aimed at improving “in project” test pass rates and quality reporting. We also have targets for a metric we call “automation signal-noise ratio,” meaning the proportion of test case failures attributable to product bugs rather than test infrastructure issues.

Dundee Quality Plan

Great, we’ve got some goals. But how are we going to achieve them? The Dundee Quality Plan includes a set of actions aimed at ensuring we meet our quality targets and a set of detailed quality criteria to be met at various internal milestones throughout the project duration.

Before the project execution got underway we required the Quality Plan to be reviewed and agreed by the Engineering team as a whole, with commitment to provide all necessary resources. Example actions in the Dundee Quality Plan are:

  • extend use of static analysis to find bugs early
  • improve system test planning on complex features by systematizing new techniques piloted on XenServer 6.5
  • extend the coverage of our automated interop testing (i.e. testing new XenServer builds with other Citrix products)
  • ensure a full program of automated system test, including functional, stress and performance test, is carried out every 2 weeks
  • a staged program of alpha releases and a tech preview

Keeping everyone honest

Even with agreed Quality Goals and a signed-off Quality Plan, vigilance is required to ensure quality remains at the forefront of everyone’s minds! To this end, the Quality Manager publishes a bi-weekly quality report. This contains data on test coverage and test pass rates for the period, defect inflow and outflow stats and a general report on “quality risks and issues.”

This report is a major highlight of project review meetings. If the Quality Manager deems that the team is not on target to meet the quality goals then corrective action must be taken – it is not an option to ignore it, or silently trade out quality to meet release deadlines. This has proven a robust methodology for ensuring the quality of XenServer over a number of releases.

Figure 2 – fragments from bi-weekly Dundee Quality Reports, on the left a report from very early in the project and on the right a report from a point nearing the Dev Complete milestone. Note the increased level of test coverage as well as the improved pass rates.


What about the next release?

The next major XenServer release after Dundee is codenamed Ely.

It’s not yet time to identify Ely’s quality goals, let alone the quality plan to meet them. However, what we can say is that we will continue our strategy of pursuing incremental improvements to both product quality and engineering efficiency & effectiveness.

We will continue to base the former on up-to-date feedback from the field and from our tech support experts. We will base the latter on our own analysis of engineering KPIs.

One area that has had significant attention in Dundee but which will require continued work in Ely is that of reducing the feedback time between code checkins and bug detection. A fruitful area here is that of complementing our powerful automated system-level testing with more extensive unit and component level testing.

Achievements in this area will have a direct positive impact on efficiency and effectiveness, and in doing so have an indirect positive impact on product quality due to the time and effort savings that can be reinvested in improving product quality!