In our recent work we tackled the problem of spending way too much effort on maintaining the measuring instruments (or metric tools). When the measured entity changes you need to rewrite the script and keep two or three or five billion versions of it.
So, we played with an idea of “teaching” an algorithm how to count so that everytime the entity changes we can “re-teach” the algorithm, but not re-program it.
Guess what – it worked! We played with the LOC metric and got over 90% accuracy on the first try. Cost of re-designing the measuring instrument to adjust to new information needs – almost 0 (null, nil).
Background: The results of counting the size of programs in terms of Lines-of-Code (LOC) depends on the rules used for counting (i.e. definition of which lines should be counted). In the majority of the measurement tools, the rules are statically coded in the tool and the users of the measurement tools do not know which lines were counted and which were not. Goal: The goal of our research is to investigate how to use machine learning to teach a measurement tool which lines should be counted and which should not. Our interest is to identify which parameters of the learning algorithm can be used to classify lines to be counted. Method: Our research is based on the design science research methodology where we construct a measurement tool based on machine learning and evaluate it based on open source programs. As a training set, we use industry professionals to classify which lines should be counted. Results: The results show that classifying the lines as to be counted or not has an average accuracy varying between 0.90 and 0.99 measured as Matthew’s Correlation Coefficient and between 95% and nearly 100% measured as the percentage of correctly classified lines. Conclusions: Based on the results we conclude that using machine learning algorithms as the core of modern measurement instruments has a large potential and should be explored further