Exploring code weaknesses in StackOverflow

https://doi.ieeecomputersociety.org/10.1109/TSE.2021.3058985

Whether we like it or not, software designers, programmers and architects use StackOverflow. Mostly because they want to be part of a community – help others and help themselves.

However, StackOverflow has become a de-facto go-to place to find programming answers. Oftentimes, these answers include usage of libraries or other solutions. These libraries solve the immediate problems, but they can also introduce vulnerabilities that the programmers are not aware of.

In this article, the authors review how C/C++ authors introduce and revise vulnerabilities in their code. From the introduction: “We scan 646,716 C/C++ code snippets from Stack Overflow answers. We observe that code weaknesses are detected in 2% of the C/C++ answers with code snippets; more specifically, there are 12,998 detected code weaknesses that fall into 36% (i.e., 32 out of 89) of all the existing C/C++ CWE types.

I like that the paper presents a number of good examples, which can be used for training of software engineers. Both at the university level and later during their work. Some of them can even be used to create coding guidelines for companies – including good and bad examples.

The paper has a lot of great findings about the way in which weaknesses and vulnerabilities are introduced, for example ” 92.6% (i.e., 10,884) of the 11,748 Codew has weaknesses introduced when their code snippets were initially created on Stack Overflow, and 69% (i.e., 8,103 out of 11,748) of the Codew has never been revised

I strongly recommend to read the paper and give it to your software engineers to scan….

Crowdsmelling…

BIld av Ajale från Pixabay

2012.12590.pdf (arxiv.org)

The concept of crowdsourcing is well known in our community. We are accustomed to reading other’s code and learning from it at the same time improving it. Even the “captcha’s” are a good example of crowdsourcing.

However, crowdsmelling? Well, the idea is not as outrageous as one might think. It’s actually an interesting one. It is essentially a way of using collective knowledge about code smells to design machine learning to recognize them. It’s actually the very idea which we use in our Software Center project, and which we support.

In this paper, the authors focus on special kind of code smells – the ones linked to technical debt. The results are promising and we should keep an eye on this work in order to see if this improves.

From the abstract: “Good performances were obtained for God Class detection (ROC=0.896 for Naive Bayes) and Long Method detection (ROC=0.870 for AdaBoostM1), but much lower for Feature Envy (ROC=0.570 for Random Forrest).”