Recently, I had an interesting discussion about code qualities that are seldom part of software research. An example of such quality is readability, which is the degree to which we can read the code correctly.
Low readability does not need to lead to defects in the code, but in the long run it does. In the context of software engineering of products that evolve over long time, readability is dangerously close to understandability and therefore also very close to modifiability and correctness.
I’ve come across the following paper recently:
Scalabrino, S., Linares-Vásquez, M., Oliveto, R. and Poshyvanyk, D., 2017. A Comprehensive Model for Code Readability, published in Software Evolution and Maintenance journal.
The paper has designed a set of features for texts, which can help to quantify readability. Let me quote the abstract:
“…the models proposed to estimate code readability take into account only structural aspects and visual nuances of source code, such as line length and alignment of characters. In this paper, we extend our previous work in which we use textual features to improve code readability models. We introduce 2 new textual features, and we reassess the readability prediction power of readability models on more than 600 code snippets manually evaluated, in terms of readability, by 5K+ people. […] The results demonstrate that (1) textual features complement other features and (2) a model containing all the features achieves a significantly higher accuracy as compared with all the other state‐of‐the‐art models. Also, readability estimation resulting from a more accurate model, ie, the combined model, is able to predict more accurately FindBugs warnings.”