Although a lot has been written about predicting defects, the problem is still valid. Some systems have more defects than others. In academia, we can do two things – educate young engineers in making better software or construct models for predicting where and when to find defects.
A lot of work in the defect prediction model development focuses on more-or-less randomly found releases. However, software development is not random, but structured, and often, continuous. This means that it’s important to understand that not all defects are found in the same release/path/commit as they are introduced (BTW: there is a lot of work on this aspect too).
In this work, the authors analyze 120 continuous releases of six software products and demonstrate the value of their prediction models. The novelty of this approach is a system that checks whether the releases are similar to one another based on the distributional characteristics. This means that the prediction models are tuned to each release based on these characteristics. These characteristics are, mostly, well-known metrics like the average cyclomatic complexity of a file, a MaxInheritanceTree of a class, etc. So – easy to collect and analyze, a lot of tools can be used for that.
The results, in short, show that the new method is better than randomly choosing a release or bagging releases. The results differ per project, but the approach is better than the other two, across the board.
I like the approach and will try it the next time I get my hands on software defects, issues, challenges. Let’s see when that happens:)