Challenges when using ML for SE (article review)

Image by Pexels from Pixabay

104294.pdf (scitepress.org)

Machine learning has been used in software engineering for a while now. It used to be called advanced statistics, but with the popularization of artificial intelligence, we use the term machine learning more often. I’m one of those who like to use ML. It’s actually a mesmerizing experience when you train neural networks – change one parameter, wait a bit and see how the network performed, then again. Trust me, I’ve done it all too often.

I like this paper because it focuses on challenges for using ML, from the abstract:

In the past few years, software engineering has increasingly automating several tasks, and machine learning tools and techniques are among the main used strategies to assist in this process. However, there are still challenges to be overcome so that software engineering projects can increasingly benefit from machine learning. In this paper, we seek to understand the main challenges faced by people who use machine learning to assist in their software engineering tasks. To identify these challenges, we conducted a Systematic Review in eight online search engines to identify papers that present the challenges they faced when using machine learning techniques and tools to execute software engineering tasks. Therefore, this research focuses on the classification and discussion of eight groups of challenges: data labeling, data inconsistency, data costs, data complexity, lack of data, non-transferable results, parameterization of the models, and quality of the models. Our results can be used by people who intend to start using machine learning in their software engineering projects to be aware of the main issues they can face.

So, what are these challenges? Well, I’m not going to go into details about all of them, but I’d like to focus on the ones that are close to my heart – data labelling. The process of labelling, or tagging, data is usually very time consuming and very error-prone. You need to be able to remember how you actually labelled the previous data points (consistency), but also understand how to think when finding new cases. This paper does not list the challenges, but gives a pointer to a few paper where they are defined.