Measuring readability of code…

Recently, I had an interesting discussion about code qualities that are seldom part of software research. An example of such quality is readability, which is the degree to which we can read the code correctly.

Low readability does not need to lead to defects in the code, but in the long run it does. In the context of software engineering of products that evolve over long time, readability is dangerously close to understandability and therefore also very close to modifiability and correctness.

I’ve come across the following paper recently:

Scalabrino, S., Linares-Vásquez, M., Oliveto, R. and Poshyvanyk, D., 2017. A Comprehensive Model for Code Readability, published in Software Evolution and Maintenance journal.

The paper has designed a set of features for texts, which can help to quantify readability. Let me quote the abstract:

“…the models proposed to estimate code readability take into account only structural aspects and visual nuances of source code, such as line length and alignment of characters. In this paper, we extend our previous work in which we use textual features to improve code readability models. We introduce 2 new textual features, and we reassess the readability prediction power of readability models on more than 600 code snippets manually evaluated, in terms of readability, by 5K+ people. […] The results demonstrate that (1) textual features complement other features and (2) a model containing all the features achieves a significantly higher accuracy as compared with all the other state‐of‐the‐art models. Also, readability estimation resulting from a more accurate model, ie, the combined model, is able to predict more accurately FindBugs warnings.”

How to validate software measures – list of attributes from a systematic review

During the weekend I did some digging into the quality of measurement, in particular, I tried to answer a question from a colleague on measurement accuracy limits. Well, instead of digging into the accuracy, I managed to look at the validation of measures in general.

I’ve been searching for methods how people evaluate software measures and I came across this nice paper from Laurie Williams and colleagues:

This systematic review lists 47 criteria used to evaluate software metrics, combining both the empirical and theoretical validation. Here is the list of what they found:

  • A priori validity
  • Actionability
  • Appropriate Continuity
  • Appropriate Granularity
  • Association
  • Attribute validity
  • Causal model validity
  • Causal relationship validity
  • Content validity
  • Construct validity
  • Constructiveness
  • Definition validity
  • Discriminative power
  • Dimensional consistency
  • Economic productivity
  • Empirical validity
  • External validity
  • Factor independence
  • Improvement validity
  • Instrument validity
  • Increasing growth validity
  • Interaction sensitivity
  • Internal consistency
  • Internal validity
  • Monotonicity
  • Metric Reliability
  • Non-collinearity
  • Non-exploitability
  • Non-uniformity
  • Notation validity
  • Permutation validity
  • Predictability
  • Prediction system validity
  • Process or Product Relevance
  • Protocol validity
  • Rank Consistency
  • Renaming insensitivity
  • Repeatability
  • Representation condition
  • Scale validity
  • Stability
  • Theoretical validity
  • Trackability
  • Transformation invariance
  • Underlying theory validity
  • Unit validity
  • Usability

The list is really impressing, but not all attributes apply to all types of metrics. So, one should always look for the use of metric and then seek the right type of its validation. I recommend this article as great reading for those who are thinking about creating own metrics:)

Research agenda for Continuous * (article highlight)

I’ve came across an interesting article summarizing the recent developments of continuous software engineering and related fields. The research has been done by Brian Fitzgerald and his colleagues from LERO:

I recommend reading this article and here, I just put some thoughts that interested me:

  • Feature analytics is still important and will remain important for both the development and operations
  • It’s more important to be continuous than to be fast – although I would argue that being slowly continuous is never a good thing, and
  • Discontinuous improvement seems to be more interesting than continuous improvement

The article even discusses what kind of recent *2017* developments could be observed in this area, and link to the well-known initiatives, such as Lean and Agile.

How large companies innovate ….

Large software companies are really diffrent from each other. That’s hardly a surprise, but do they work in different ways?

If we look at the works like “Good to Great” or “Build to Last” by Jim Collins and his colleagues, we can see that they have similarities. They make the same mistakes and they have similar success factors.

In this paper:, the authors conducted a literature review of how the innovation is done in large companies. They have found only seven companies, but they’ve recognized a few interesting intiatives (descriptions quoted from the text):

  • intrapreneurship: intrapreneurs have the vision for new products and act on their vision as if they had their own companies: build the development team and run the business,
  • bootlegging: bootlegging (or underground or skunkworks) refers to the innovation activity that is hidden from management until its introduction. The objectives of bootlegging are pre-research, product and process improvement, troubleshooting, new product and process development and purely scientific research
  • internal venture: internal venture refers to the introduction of new business within existing business to pursue product or market innovation. New business can be established as the instrument to pursue incremental innovation (new product in current market or new market for current product) or radical innovation new product for new market).
  • spin-off, subsidiaries, joint-ventures, and
  • crowdsourcing: getting the participation of crowd and locking the crowd to create value to one company only. By taking the advantage of Web 2.0, companies look for the suitable solutions from Internet users.

These approaches vary in size, structure and scope. I recommend to read this article as a friday, before homegoing, reading 🙂

Full reference:

Henry Edison, Xiaofeng Wang, Ronald Jabangwe, Pekka Abrahamsson,
Innovation Initiatives in Large Software Companies: A Systematic Mapping Study,
Information and Software Technology, Volume 95, 2018, Pages 1-14, ISSN 0950-5849,

Continuous and collaborative technology transfer: Software engineering research with real-time industry impact – interesting article

I’ve been browsing the latest issue of IST and this article cought my attention. The article is written by Tommi Mikkonen, Casper Lassenius, Tomi Männistö, Markku Oivo, Janne Järvinen. It is about technology transfer from academia to industry. It’s available at:

The best point in this article is very important – the technology is NOT created in academia and transferred to industry, it is rather created either in industry or in collaboration with academia. This observation invalidates many of the technology transfer models, where the authors assume that the companies receive the results from academia.

But, has this actually happen? How often does it really happen? I guess, not very often.

The paper presents a model of collaboration, which is presented in the following link (and figure):

I’m happy to see more collaboration models for industry-academia co-creation of results!

Using thresholds (a la risk) to predict quality of software modules

I often tell my students that the absolute values of measures do not always say much. Take an example of McCabe cyclomatic complexity – the number of 100 (meaning 100 independent paths through a method) does not need to denote problems. It could be a large switch statement which changes locale based on the IP address type. However, it is important to monitor thresholds of measures, based on the meaning of the function and the problem at hand.

In this article from IST, “Software metrics thresholds calculation techniques to predict fault-proneness: An empirical comparison” (, we can learn three different types of finding thresholds for software measures – ROC curves, VARL, and Alves ranking (named after the author of the method). This article shows how well we can predict the fault-proneness of modules if we use thresholds rather than absolute value.

Have a nice reading!

Exchanging metrics and measurement observations- the SMM from OMG

In our research and development work, we often spend a lot of time on structuring the measurement information. As difficult as it is to do it upfront, we still manage to get something if we base our work on the ISO&IEC 15939 standard and it’s measurement information meta-model.

However, the challenge arises when we want to provide a structure over an existing database or if we want to exchange the measurement information (values, definitions, etc.).

Here, we have the Software Measurement Meta-Model to the rescue ( The meta-model provides the types and relations to describe the measurement data. In the first glance, one could think that this is a meta-model to structure your measurement database, but it is not. The structure of the database needs to be much simpler, but the exchange format is to be more complex. The complexity stems from the fact that one needs to make sure that two meta-levels are transferred – the measurement data and its meta-data. It’s the meta-data that is mostly complex and therefore the entire format gets complex.

I recommend to take a look at the structure of the meta-model, so that one can understand the complexity of measurement processes.

However, there is a spoiler – this is NOT a bed-time reading!

How to measure software architectures?

The question of what to measure pops up very often in our work. It’s not easy to answer, but we give guidance for areas of software engineering. In this blog post, I focus on the area of software architectures.

For the software architectures, there are a few areas that are important:

  • Stability of the architecture
  • Complexity of the architecture
  • and Quality of the architecture

These areas can be quantified with measures of – interface stability, component complexity, technical debt and similar. Based on our work with the companies in Software Center, we realized that these measures are more important than more complex measures like – architecture weight, architecture stability index or architectural debt. The reason is that the simple measures provide an immediate understanding of what can be done to improve the situation.

In our work, we also looked at how to calculate these measures based on open data sets of complexity of automotive software.

If you are interested in the details of how to use the measures, please take a look at Chapter 7 of my book:

The beginning of your measurement program – the model

Software development measurement programs can take many shapes and forms. They are often aligned with the goals of their organizations. In this blog post I would like to provide a short summary of the most common measurement models which can be encountered in the field of software engineering today. The set of models is naturally growing all the time, so new suggestions are more than welcome.

To put this blog post in context, let’s start from the ISO 15939 standard, which is the official standard for measurement programs in the software and systems engineering industry. The standard provides both the vocabulary and the recommended hierarchical model of information needs, indicators and measures. The strength of this model is its unification with the othe ISO standards, like the newest ISO 25000 series, and in particular the ISO 2502x parts, i.e., quality measurement division.

However, ISO 15939 is not the only standard in this field, there are others too. Some of them are more well known and some are less, but most of the standards in my list are used somewhere.

  • GQM – the classical Goal Question Metric model, which is very flexible and can be used for almost any measurement program and purpose. It’s structure is similar to the structure of the measurement information model of ISO 15939, but it’s does not contain such concepts as information need and indicator.
  • GQIM – the extension of the classical model to include the notion of the indicator.
  • Lightweight GQM, – is a version of GQM for small and medium enterprises. The lightweight keyword here means that
  • MIS-PyME MCMM, –  a measurement maturity model, constructed in a similar manner to CMMI, but aligned with measurement needs and only about the measurement programs. To some extent, this model provides examples of measures to be used at different levels and in different areas.
  • SQIP – a model based on the CMMI requirements for the measurement programs in more mature organizations, from level 2 upwards (if I remember correctly).

And this is not the end of the list, we have more, like:

  • QFD
  • AAHA
  • LQIM
  • HSC
  • Tarc
  • OMSD
  • SCAPT,
  • …. and some more…

These other approaches seem to be a bit more niche in both their adoption in industry and in academia. If you know, use or heard of, any other approaches, please let me know.


How to develop automotive software ….

In 2015 I had a chance to co-organize a workshop on automotive software architectures. The workshop brought together many researchers and practitioners to discuss the issues related to designing large automotive software systems. After the workshop I realized that there is still a need to describe how to design the automotive software from the perspective of a software engineer.

Thanks to my editor, Ralf, who approached me and asked for a book on automotive software architectures, we have a possibility to show the results of it.

The book is primarily for software engineers who want to understand how the automotive software is designed, deployed and standardized.


This book introduces the concept of software architecture as one of the cornerstones of software in modern cars. Following a historical overview of the evolution of software in modern cars and a discussion of the main challenges driving that evolution, Chapter 2 describes the main architectural styles of automotive software and their use in cars’ software. In Chapter 3, readers will find a description of the software development processes used to develop software on the car manufacturers’ side. Chapter 4 then introduces AUTOSAR – an important standard in automotive software. Chapter 5 goes beyond simple architecture and describes the detailed design process for automotive software using Simulink, helping readers to understand how detailed design links to high-level design. Next, Chapter 6 presents a method for assessing the quality of the architecture – ATAM (Architecture Trade-off Analysis Method) – and provides a sample assessment, while Chapter 7 presents an alternative way of assessing the architecture, namely by using quantitative measures and indicators. Subsequently Chapter 8 dives deeper into one of the specific properties discussed in Chapter 6 – safety – and details an important standard in that area, the ISO/IEC 26262 norm. Lastly, Chapter 9 presents a set of future trends that are currently emerging and have the potential to shape automotive software engineering in the coming years.

This book explores the concept of software architecture for modern cars and is intended for both beginning and advanced software designers. It mainly aims at two different groups of audience – professionals working with automotive software who need to understand concepts related to automotive architectures, and students of software engineering or related fields who need to understand the specifics of automotive software to be able to construct cars or their components. Accordingly, the book also contains a wealth of real-world examples illustrating the concepts discussed and requires no prior background in the automotive domain.

Available at Amazon:

And Springer: