Software data fuels AI, ML and Software Analytics

I’ve talked about software analytics in the previous post, in particular the latest issue of IEEE Software. In this post, let me introduce an interesting book for software engineers and software engineering scientists interested in software analytics: Bird, C., Menzies, T., & Zimmermann, T. (Eds.). (2015). The Art and Science of Analyzing Software Data. Elsevier.

After reading a few chapters, one conclusion emerged – the fact that modern software analytics is not about algorithms, it’s about data and its collection. It’s about measurement, quantification and metrics. Even the analysis of qualitative data is often done using measurements in order to speed it up.

Harvard Business Review claimed that “Big Data is Not the New Oil” as there are fundamental differences between the scarce fossil fuel and abundant data from software project (https://hbr.org/2012/11/data-humans-and-the-new-oil). However, even though data is not scarce, I believe that it will fuel the software industry for at least one more decade.

Therefore, we still need to teach our students how to work with data, how to collect and analyse it, and how to assess its value. We also need to understand how to monetise the data.

Software analytics, the next thing for software metrics in modern companies

The hot summer in Europe provided a lot of time for relaxation and contemplation:) I’ve spent some of the warm days reading some articles for the upcoming SEAA session on software analytics, which is a follow up of the special issue of IST: https://doi.org/10.1016/j.infsof.2018.03.001 

Software analytics, simply put, is using data and its visualisation to make decisions about software development. The typical data sources, both in literature and observed in many companies, are:

  1. Source code measurements from Git
  2. Defect data from JIRA
  3. Requirements data
  4. Customer data, a.k.a. field data
  5. Performance/profiling data from running the system
  6. Process data from time reporting systems, Windows journals, etc.

These data sources allow us to find bottlenecks in the performance of our software and the performance of our progress.

Software analytics has been in the heart of such paradigms as the MVP from The Lean Start-Up, where they provide the ability to steer which features are developed and which are abandoned.

Our experiences from Software Analytics are described in the book Software Development Measurement Programs, chapter 5: https://www.springer.com/us/book/9783319918358