Consistency in code reviews (article review) – SE metrics (Software Engineering)

tse2020_hirao.pdf (uwaterloo.ca)

In the last year, I’ve written a lot about code reviews, mostly because this is where I put my effort now and where I see that software engineers could improve.

Although there is a lot of studies about how good code reviews are and what kind of benefits they bring, there is no doubt that code reviews are a tiresome task. You read software code and try to improve it, but, let’s be honest, if it works don’t break it – right?

In this paper, the authors study open source communities and check how often the reviewers actually agree upon the code review score. They find that it’s not that often – 37% disagree. From the paper: “How often do patches receive divergent scores? Results: Divergent review scores are not rare. Indeed, 15%–37% of the studied patch revisions that receive review scores of opposing polarity“

They also study how the divergence actually influences the patches – are they integrated or not: “Patches are integrated more often than they are abandoned. For example, patches that elicit positive and negative scores of equal strength are eventually integrated on average 71% of the time. The order in which review scores appear correlates with the integration rate, which tends to increase if negative scores precede positive ones. “

Finally, they study when the discussions/disagreements happen and how many reviewers there actually are: “Patches that are eventually integrated involve one or two more reviewers than patches without divergent scores on average. Moreover, positive scores appear before negative scores in 70% of patches with divergent scores. Reviewers may feel pressured to critique such patches before integration (e.g., due to lazy consensus).2 Finally, divergence tends to arise early, with 75% of them occurring by the third (QT) or fourth (OPENSTACK) revision. “

I think that these results say something about our community – that we tend to disagree, but do integrate the code anyways. What does that mean?

It could mean two things, which IMHO are equally valid:

The review comments do not really touch upon crucial aspects and therefore are deemed not so important (e.g. whether we call something weatherType or typeOfWeather as a variable…)
The reviewers’ reputation makes it difficult to get some of the comments through, e.g. when a junior reviewer is calling for a complete overhaul of the architecture.

Either way – I think that the modern code review field is quite active these days and I hope that we can get something done about the speed and quality of these long and tiresome code review processes.

Author: Miroslaw Staron

I’m professor in Software Engineering at Computer Science and Engineering. I usually blog about interesting articles (for me) and my own reflections on the development of Software Engineering, AI, computer science and automotive software. View all posts by Miroslaw Staron