Understanding what’s going on helps you become a better software developer…

BIld av Twighlightzone från Pixabay

10.1109/MS.2020.3014223

I’m a big fan of the Matrix movies, but well, to be honest, who isn’t:) I like the scene where Morpheus gives Neo the choice of two pills – one to know the truth and the other one to go on living his life as previously.

Well, sometimes I feel the same when I do my programming tasks – do I really want to know what the code does, or just make a quick fix and move on? I would say that it’s 50-50 for me – sometimes I feel like contributing and sometimes I just fix the problem and move on.

In this paper, the authors conduct an experiment to understand how and when software developers make mistakes. They find that “[the] study suggests that a relatively high number of mistakes are related to communicating with stakeholders outside of the development team.

Having worked with metrics teams all over the globe, I’ve noticed that the communication with the stakeholders is often the largest problem that you can have. The stakeholders don’t speak “requirements” and we do not understand “wants” of the stakeholders. But, well, it’s not what the paper is about.

What I like about the paper is the systematic approach to the study – using experiments and a technique for teaching the developers how to work with their limitations. This is what the authors recommend as remedies (quoted directly from the paper):

  • Know your own weaknesses. Every developer is different and struggles with different concepts. Our analysis shows a variety of types of errors that developers make. Developers becoming more conscious of the human errors they commonly make and actively checking for these can help reduce errors.
  • Use cognitive training. We have shown that using cognitive training, like the OODA loop, seems to help decision making and can reduce the human errors a developer makes.
  • Simplify your workload. One of the biggest causes of human error reported by the developers in our study was the complexity of the development environment. Reducing the cognitive load by simplifying the complexity of the development environment could reduce human errors. Actions such as minimizing the number of simultaneous development tasks and closing down unnecessary tools and windows can help reduce the cognitive load.
  • Communicate carefully with stakeholders outside your team. Our study suggests that a relatively high number of mistakes are related to communicating with stakeholders outside of the development team. Ensuring that communication is clearly understood seems important to reducing mistakes.

Consistency in code reviews (article review)

BIld av press 👍 and ⭐ från Pixabay

tse2020_hirao.pdf (uwaterloo.ca)

In the last year, I’ve written a lot about code reviews, mostly because this is where I put my effort now and where I see that software engineers could improve.

Although there is a lot of studies about how good code reviews are and what kind of benefits they bring, there is no doubt that code reviews are a tiresome task. You read software code and try to improve it, but, let’s be honest, if it works don’t break it – right?

In this paper, the authors study open source communities and check how often the reviewers actually agree upon the code review score. They find that it’s not that often – 37% disagree. From the paper: “How often do patches receive divergent scores? Results: Divergent review scores are not rare. Indeed, 15%–37% of the studied patch revisions that receive review scores of opposing polarity

They also study how the divergence actually influences the patches – are they integrated or not: “Patches are integrated more often than they are abandoned. For example, patches that elicit positive and negative scores of equal strength are eventually integrated on average 71% of the time. The order in which review scores appear correlates with the integration rate, which tends to increase if negative scores precede positive ones.

Finally, they study when the discussions/disagreements happen and how many reviewers there actually are: “Patches that are eventually integrated involve one or two more reviewers than patches without divergent scores on average. Moreover, positive scores appear before negative scores in 70% of patches with divergent scores. Reviewers may feel pressured to critique such patches before integration (e.g., due to lazy consensus).2 Finally, divergence tends to arise early, with 75% of them occurring by the third (QT) or fourth (OPENSTACK) revision. “

I think that these results say something about our community – that we tend to disagree, but do integrate the code anyways. What does that mean?

It could mean two things, which IMHO are equally valid:

  1. The review comments do not really touch upon crucial aspects and therefore are deemed not so important (e.g. whether we call something weatherType or typeOfWeather as a variable…)
  2. The reviewers’ reputation makes it difficult to get some of the comments through, e.g. when a junior reviewer is calling for a complete overhaul of the architecture.

Either way – I think that the modern code review field is quite active these days and I hope that we can get something done about the speed and quality of these long and tiresome code review processes.

AI for decision makers…

Image by Gerd Altmann from Pixabay

In the last post of 2020, I would like to wish everyone Merry X-Mas and a fantastic 2021. Well, I guess that a normal 2021 would also work.

I would like to thank all my collaborators so far. I hope that I could contribute to your work at least half of what you did for me.

To end on a positive note, if you are interested in how to use AI for making decisions – here is the link to the seminar material that I developed together with GUSEE (GU executive education school): AI for Decision Makers – GU Play, Göteborgs universitet

Using skillset to do something different – helps me to reinvent myself and get more fun…

Image by Pexels from Pixabay

2020 was the year like no other. Everyone can agree with that. The pandemic changed our lives a lot – the pace of digitalization has gone from tortoise to a Space-X rocket!

For me, this year has also changed a lot of things. I’ve moved into new field of medical signal analysis using ML. I realized that my skillset can be used to help people. Maybe not the ones that were hit by the pandemic, but still people who need our help.

Together with a team of great specialists from the Sahlgrenska university hospital, we managed to create a set-up of collecting data in the operation room, tagging them and then, finally using ML.

In the last three months, we managed to move from 0 to having three articles in the making, collecting data from several patients, fantastic accuracy and a great deal of fun.

Here is the link to the movie that describes our work: CHAIR – GU Play, Göteborgs universitet

I’ve reflected upon this project and it’s probably the project where I had the most fun during 2020. It’s a completely new set-up, great team, extreme energy in the work and a great deal of meaning behind it.

The project was partially sponsored by Chalmers CHAIR initiative. Thank you!

Data labelling – activity that makes people hate ML….

Image by S. Hermann & F. Richter from Pixabay

Data Labeling: An Empirical Investigation into Industrial Challenges and Mitigation Strategies | SpringerLink

Machine learning is hungry for data. The more you have, the happier it will be. Seems very easy when we learn how to program ML and how it works – there is plenty of open data sources to practice and learn from.

However, when we want to use ML for our purposes, things get a bit more complicated. There is a lot of data, but not in the right format. The one that is in the right format is incomplete. The one that is complete, is noisy. The one that is not noisy is too little. We need to collect more. And so the story goes on, and on, and on….

Collecting the data is not that problematic, as it can often be automated. At least in software engineering, automotive, telecon, transport/logistic and medicine. These are the ones I know, anyways. What is problematic, though is data labelling. It is the activity where we take each data point and add a class to it, or its label if we speak machine-learnish. The person doing the labelling needs to be competent to be able to label the data correctly – he/she needs to know the domain, know the data, know the context. Then, this person also needs to have a fantastic memory, because the labels need to be consistent. They also need to be unambiguous given the underlying feature vector.

In this paper, colleagues from our department study the process of data labelling and its challenges.

They find the following to be selected examples of challenges:

  • Lack of a systematic approach to labeling data for specific features
  • Unclear responsibility for labeling
  • Noisy labels
  • Difficulty to find a correlation between labels and features
  • Skewed label distributions
  • Time dependence
  • Difficulty to predict future uses for datasets

I think it’s a great work and reading for everyone who wants to get into ML for real, start using it at a company and understand whether it’s actually gives any benefit.

From the abstract: Labeling is a cornerstone of supervised machine learning. However, in industrial applications, data is often not labeled, which complicates using this data for machine learning. Although there are well-established labeling techniques such as crowdsourcing, active learning, and semi-supervised learning, these still do not provide accurate and reliable labels for every machine learning use case in the industry. In this context, the industry still relies heavily on manually annotating and labeling their data. This study investigates the challenges that companies experience when annotating and labeling their data. We performed a case study using a semi-structured interview with data scientists at two companies to explore their problems when labeling and annotating their data. This paper provides two contributions. We identify industry challenges in the labeling process, and then we propose mitigation strategies for these challenges.

Testing machine learning systems…

Image by Comfreak from Pixabay

https://rdcu.be/caKuc

Today, everybody is talking about machine learning and AI. Some talk about deterministic models, some about statistical ones, some about bayesian, some talk about X-mas 🙂

My experience with working with machine learning is that we need to be very careful what we actually do. If we do the machine learning in the classical sense, e.g. neural network models or decision trees. Then we need to make sure that we test the system alongside the data. Never together with the data. We need to prepare a dataset that we use as a reference and which we know well.

Testing, in that scenario, becomes just like we know it. We can make calculations manually, or just step-by-step, and we can check if the algorithm behaves like this.

Testing the system is also not difficult if we follow principles of good engineering – separation of concerns, modularization, observability.

In the runtime, we need to make sure that we add mechanisms related to such aspects as out-of-bounds distributions and safety cages for ML algorithms.

Either way, I recommend this article for all ML designers and product managers who want to know what’s the state of the art in this field, from the perspective of testing. A good overview, nice reading!

Who and when needs automated code reviews…

https://rdcu.be/caKsW

Image by Arek Socha from Pixabay

Having worked with code reviews for a while, I strongly sympathize with the thesis put forward by the authors of this paper – code review tools are still far from being supporting for software developers.

Yes, they do automate the process and organize it. Yes, they help in assuring that all code is reviewed and yes, they do help to capture problems in the code and help to spread the knowledge.

However, what I expect from such a tool is to help me to find problems in the code. I would like to have a tool that would help me, as a designer, get better: avoid mistakes, use cool programming constructs, make better design. None of the tools I know help with that.

This paper shows that my understanding is similar to the developers studied in the paper. Documentation – automatically fixing and suggesting were top priority. Renaming suggestions, commenting and explaining were some others.

Detection of duplicated code, architectural analysis and similar things were also mentioned as expectations. I cannot agree more! These things are priority 1 – I would also expect them to be there.

Now, some are more difficult that others – like analyzing the architecture. Not a trivial task at all, cause what is the architecture? Where are the patterns? How to find it from the code? How to rely on the tools that research provides? W’re not there yet.

Duplicate code, however, is something we should be able to fix. I’ve looked at some repository that had over 200 papers about code clones, duplicates and what have you. Are all these papers good? Probably not, but even if 10% is good, then here we have 20 tools we can try.

I agree, we do have SonarQube and similar tools, but they are not integrated with code review. I cannot just link to a report from SQ when writing a review comment. I cannot add a review comment to a detected technical debt in SQ. So, no integration then?

Maybe it’s just a friday afternoon thing, but I hope that we can get better in making the last mile with our tools. Hope that we can address the expectations that the developers have …

Data analytics in SE

https://www.sciencedirect.com/science/article/abs/pii/S0950584920301981

Image by Werner Weisser from Pixabay

A few years ago, data analytics and big data were super popular in software engineering. In fact, they were a bit too popular, as many authors quoted big data because they had a diagram in the paper.

Fast forward to today and the situation is a bit different. We are more mature in using data in software development. We know that Big data is about the 5 Vs and that we can reason about it. We also know what providing the diagrams is not the same as using them to direct software development.

I found this paper when looking for literature for our new work on communication in software metrics teams. My colleagues study the communication and found that there can be several sources of confusion. Now, this paper is NOT about the confusion, but about prevalence of data analytics in software engineering. The working definition of Big Data Analytics is as follows in the paper: “Big data analytics is the process of using analysis algorithms running on powerful supporting platforms to uncover potentials concealed in big data, such as hidden patterns or unknown correlations”.

The paper poses three main research questions about the studies conducted in Big Data Analytics, about the approaches used and when they are used. I’m mostly interested in the second – which approaches are used. There, the authors pose three sub-questions:

RQ2.1: What types of analytics have been used in the ASD domain?
RQ2.2: What sources of data have been used?
RQ2.3: What methods, models, or techniques have been utilized in the studies?

In particular, the second one is the most interesting one – sources of data. There, the authors found that there are plenty. The entire table (Table 7 in the paper) is actually too large to quote, but let me just quote one of the categories: Source code and data model:

  • Source code
  • Ruby programs & Ruby on Rails
  • Java programs
  • Function calls
  • Code metrics
  • Development repository
  • Test case
  • Code quality
  • Application data schema

I recommend this as a good reading into the current state-of-the-art in data analytics in software engineering. I think we’ve matured a lot in the last decade as a community and that brings a lot of benefit. Our software development gets better and thus our software gets better.

From the abstract: In total, 88 primary studies were selected and analyzed. Our results show that BDA is employed throughout the whole ASD lifecycle. The results reveal that data-driven software development is focused on the following areas: code repository analytics, defects/bug fixing, testing, project management analytics, and application usage analytics.

Is confusion a factor when reviewing a code?

https://www.win.tue.nl/~aserebre/EMSE2020Felipe.pdf

Image by Myriams-Fotos from Pixabay

Reviewing the code is an art. After working with the topic for a few years, we’ve realized that this is like reading a chat – one person responds to a piece of message sent by another person. The message often being the code and the response being the review comment. What we’ve discovered is that the context of the review is important as well as the possibility to ask questions. We even discuss having a taxonomy of these review comments to ease understanding of “where” in the review process one is at the moment.

This article caught my attention because it is about understading when a reviewer is actually confused when reading the code and making the comment. It’s a very nice piece of work as it combines code review comments analysis and surveys.

The results of the survey are interesting as they point out that the authors are confused much less than the reviewers – which is often caused by the fact that the comment is a response, while the code is the message. Quoting the paper: RQ1 Summary – Reasons for confusion: We found a total of 30 reasons for confusion. The most prevalent are missing rationale, discussion of the solution: non-functional, and lack of familiarity with existing code. We observe that tools (code review, issue tracker, and version control) and communication issues, such as disagreement or ambiguity in communicative intentions, may also cause confusion during code reviews.

Finally, I like the fact that the authors do a full systematic review on the topic and triangulate the results. This work will become a number one reading for my students in the programming course, which will teach them how important good code is!

From the abstract:

Results: From the first study, we build a framework with 30 reasons for confusion, 14 impacts, and 13 coping strategies. The results of the systematic mapping study shows 38 articles addressing the most frequent reasons for confusion. From those articles, we found 19 different solutions for confusion proposed in the literature, and nine impacts were established related to the most frequent reasons for confusion.

Is noise important in SE?

https://www.researchgate.net/profile/Khaled_Al-Sabbagh/publication/344190831_Improving_Data_Quality_for_Regression_Test_Selection_by_Reducing_Annotation_Noise/links/5f5a167aa6fdcc116404d72b/Improving-Data-Quality-for-Regression-Test-Selection-by-Reducing-Annotation-Noise.pdf

Image by F. Muhammad from Pixabay

Machine learning and deep learning are only as good as the data used to train them. However, even the best data sources can lead to data of non-optimal quality. Noise is one of the exampes of the data problems.

Our research team has studied the impact of noise on machine learning in software engineering – mostly on the testing data. In this paper we present one techniques to identify noise, measure it and reduce it. There are several techniques to do it, but we use one of the more robust ones – removal of noise.

I recommend to take a look at how the algorithms work and let us know if you find it interesting!