February 2017 – SE metrics (Software Engineering)

Metrics research has gained a lot of attention from the Software Center and the results showed that tackling the complexity requires a holistic approach. Now, recently I’ve encountered a book which talked about the same principles, although at a more beginner level – “Your code as a Crime Scene” by Adam Tornhill.

The book uses metaphors of crime scene investigations to describe troubleshooting the code. I recommend this as the starting point. Readers who are interested in the going deeper into this topic should look at one of the recent PhD thesis from software center, by Dr. Vard Antinyan.

The thesis investigates the complexity of the software code, requirements and test cases. The conclusions from the work is that we can monitor the evolution of complexity using very common measures — e.g. McCabe complexity combined with the number of changes in the code. Dr. Antinyan provided even a number of tools to monitor the complexity.

The thesis can be found here: http://web.student.chalmers.se/~vard/files/Thesis.pdf.

In our recent work we tackled the problem of spending way too much effort on maintaining the measuring instruments (or metric tools). When the measured entity changes you need to rewrite the script and keep two or three or five billion versions of it.

So, we played with an idea of “teaching” an algorithm how to count so that everytime the entity changes we can “re-teach” the algorithm, but not re-program it.

Guess what – it worked! We played with the LOC metric and got over 90% accuracy on the first try. Cost of re-designing the measuring instrument to adjust to new information needs – almost 0 (null, nil).

Take a look at this paper of ours: https://gup.ub.gu.se/publication/249619, and the paper

Abstract:

Background: The results of counting the size of programs in terms of Lines-of-Code (LOC) depends on the rules used for counting (i.e. definition of which lines should be counted). In the majority of the measurement tools, the rules are statically coded in the tool and the users of the measurement tools do not know which lines were counted and which were not. Goal: The goal of our research is to investigate how to use machine learning to teach a measurement tool which lines should be counted and which should not. Our interest is to identify which parameters of the learning algorithm can be used to classify lines to be counted. Method: Our research is based on the design science research methodology where we construct a measurement tool based on machine learning and evaluate it based on open source programs. As a training set, we use industry professionals to classify which lines should be counted. Results: The results show that classifying the lines as to be counted or not has an average accuracy varying between 0.90 and 0.99 measured as Matthew’s Correlation Coefficient and between 95% and nearly 100% measured as the percentage of correctly classified lines. Conclusions: Based on the results we conclude that using machine learning algorithms as the core of modern measurement instruments has a large potential and should be explored further

Capture

Month: February 2017

Code complexity and its visualisation

How to use machine learning to build a flexible measuring instrument