article review – Page 8 – SE metrics (Software Engineering)

What do elite software developers do in software projects?

https://dl-acm-org.ezproxy.ub.gu.se/doi/10.1145/3387111

Image by Jose B. Garcia Fernandez from Pixabay

A while back I read an article in ZDNet about Linus Torvalds, the creator of Linux, and his daily work. He was (at the time of reading, which is ca. 2 years back) still working on the code. However, he was mostly working on the design of the system, reviewing patches and supporting younger designers. I’ve also read a number of articles which claimed the importance of code reviews as a way of teaching younger designers about the product and the code base.

In this paper, I’ve found that the support for younger designers is what the elite developers do a lot of. It seems that the communication, organisation and support are the activities that the elite developers find important. It’s aligned with what we do at the universities as well. The most elite professors work with students, show them how to program and how to structure their code. Seems like this is a very good way of continuing your career – help other be better.

I guess it’s time to change my wallpaper from “coding” to “teaching”….

Abstract: Open source developers, particularly the elite developers who own the administrative privileges for a project, maintain a diverse portfolio of contributing activities. They not only commit source code but also exert significant efforts on other communicative, organizational, and supportive activities. However, almost all prior research focuses on specific activities and fails to analyze elite developers’ activities in a comprehensive way. To bridge this gap, we conduct an empirical study with fine-grained event data from 20 large open source projects hosted on GITHUB. We investigate elite developers’ contributing activities and their impacts on project outcomes. Our analyses reveal three key findings: (1) elite developers participate in a variety of activities, of which technical contributions (e.g., coding) only account for a small proportion; (2) as the project grows, elite developers tend to put more effort into supportive and communicative activities and less effort into coding; and (3) elite developers’ efforts in nontechnical activities are negatively correlated with the project’s outcomes in terms of productivity and quality in general, except for a positive correlation with the bug fix rate (a quality indicator). These results provide an integrated view of elite developers’ activities and can inform an individual’s decision making about effort allocation, which could lead to improved project outcomes. The results also provide implications for supporting these elite developers.

Can we predict quality of service level quality based on metrics?

link to paper: https://doi.org/10.1016/j.infsof.2020.106313

Image by 3D Animation Production Company from Pixabay

Understanding how your product performs in the field is a very hot topic. To be honest, it’s always been. When we design and develop our product, we often want to know whether the outcome is going to be good or not. Well, this is not an easy task because we cannot really know to runtime properties of our products before very close to the final version. It’s difficult to measure end-to-end response time if we only have half of the database, or if we cannot simulate the full load of the product.

In this paper, the authors analysed over 700 web services and measured their code quality and interface quality as predictors for the quality of service.

From the abstract: source code and interface quality metrics/antipatterns are correlated with web service quality attributes (response time, availability, throughput, successability, reliability, compliance, best practices, latency, and documentation).

The internal code/interface quality measures were:

NPT: Number of port types, Interface
NOPT: Average number of operations in port types, Interface
NBS: Number of services, Interface
NIPT: Number of identical port types, Interface
NIOP: Number of identical operations, Interface
ALPS: Average length of port-types signature, Interface
AMTO: Average meaningful terms in operation names, Interface
AMTM: Average meaningful terms in message names, Interface
AMTMP: Average meaningful terms in message parts, Interface
AMTP: Average meaningful terms in port-type names, Interface
NOD: Number of operations declared, Code
NAOD: Number of accessor operations declared, Code
ANIPO: Average number of input parameters in operations, Code
ANOPO: Average number of output parameters in operations, Code
NOM: Number of messages, Code
NBE: Number of elements of the schemas, Code
NCT: Number of complex types, Code
NST: Number of primitive types, Code
NBB: Number of bindings, Code
NPM: Number of parts per message, Code
COH Cohesion: The degree of the functional relatedness of the operations of the service, Code
COU Coupling: A measure of the extent to which inter-dependencies exist between the service modules, Code
ALOS: Average length of operations signature, Code
ALMS: Average length of message signature, Code

This is quite a collection of measures and they are quite interesting, e.g. the average meaningful terms in port-type names. I must admit that it’s a new measure that I’ve not seen before.

The measures of the quality of service were:

Response Time: Time taken to send a request and receive a response, QoS
Availability: How often is the service available for consumption, QoS
Throughput: Total Number of invocations for a given period of time, QoS
Successability: Number of response / number of request messages, QoS
Reliability: Ratio of the number of error messages to total messages, QoS
Compliance: The extent to which a WSDL document follows WSDL specification, QoS
Best Practices: The extent to which a web service follows WS-I Basic Profile, QoS
Latency: Time taken for the server to process a given request, QoS
Documentation: Measure of documentation (i.e. description tags) in WSDL, QoS

Some of these are also quite interesting, e.g. Successability, which is the number of response messages per request messages.

The authors also measured some anti patterns of service design, which they list in the paper. I will not go through these anti-patterns, but I think that they are also correlated with some of the metrics.

I suggest this reading to everyone interested in web services and service design. Perhaps this could help us to get more high-performance services. I hope that I can see more of this type of research in the domain of service security – that’s an interesting area in itself.

Engineering AI systems – differences to engineering “other” software systems…

https://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=9121629

Being a software engineer working with AI for a while, I noticed that the engineering of AI systems is different. Well, maybe not building the actual system, but the way in which the knowledge about quality, testing and maintenance differ.

In this article, IEEE Software’s Editor in Chief presents her view on the topic. The main point is that this engineering is both similar and different. This quote from the paper summarizes it nicely: “I argue that our existing design techniques will not only help us make progress in understanding how to design, deploy, and sustain the structure and behavior of AI-enabled systems, but they are also essential starting points. I suggest that what is different in AI engineering is, in essence, the quality attributes for which we need to design and analyze, not necessarily the design and engineering techniques we rely on. “

One of the differences is the process of development. It is not aligned with the non-ML systems, e.g. in terms of training, testing, maintenance. ML systems are data-centric and this needs to be reflected in the AI engineering processes.

Ipek Ozkaya discusses the following misconceptions about the differences:

We can specify systems – both AI and non-AI systems cannot really be fully specified,
System correctness can be verified – we can never fully verify systems, neither AI-based on non-AI based (e.g. due to complexity),
We can avoid hidden dependencies,
We can manage system change propagation,
Frameworks do it all,
We can build reliable systems from unreliable and unpredictable subcomponents

I recommend this article to get a quick overview of the gist of the differences and misconceptions.

Problems with engineering AI systems

https://link-springer-com.ezproxy.ub.gu.se/article/10.1007/s10994-020-05872-w

Engineering machine learning systems is much more than train-evaluate cycles. It means that we need to systematically integrate these ML systems with the rest of the component. We need to build safety-cages to ensure that the decisions are not out-of-bounds and we need to make sure that we can maintain these systems.

In this paper, the authors studied an example of automated driving vehicles, not fully autonomous (but still) and shown the challenges that we need to solve before AI and ML becomes one of our “fellow drivers” on the roads.

The findings of the paper show that it’s not going to happen soon. As the authors say in the abstract: “Our results show that machine learning models are characterized by a lack of requirements specification, lack of design specification, lack of interpretability, and lack of robustness. We also perform a gap analysis on a conventional system quality standard SQuaRE with the characteristics of machine learning models to study quality models for machine learning systems. We find that a lack of requirements specification and lack of robustness have the greatest impact on conventional quality models. “

The authors provide a process for machine learning models as part of safety critical software, where the designing of the system and its real-scenario validation are a bit more apart than traditionally.

What I really like about the paper is the gap analysis of ML systems and ISO 25000 quality model. For example, as shown in this table: https://link-springer-com.ezproxy.ub.gu.se/article/10.1007/s10994-020-05872-w/tables/2

Actionable software metrics – an interesting new article

https://dl.acm.org/doi/pdf/10.1145/3383219.3383244

Working with metrics is a domain which calls for empirical data, which constantly changes. Software companies evolve and their metric programs evolve. I’ve always been interested how the metrics data is used in companies, especially in other geographical regions than the nordics. Although there are differences between companies in Sweden, Danmark or Finland, these companies are still more similar to each other than companies within the same domain in other parts of the world – and that’s perfectly fine.

In this paper, the authors captured my attention because they studied a few companies that were not on my radar before. The author have also found an interesting angle on the metrics work – what makes a metric actionable?

As it turns out, there are a few things that make the metrics actionable:

being practical
inform decision-making, and
exhibit data quality

These three characteristics are very important and agreed upon by most of the respondents. The authors also recognise the need of the metrics to be temporal, i.e. relevant for the information needs at hand.

What I also liked about the paper is that they provide the link to their data, which is the set of metrics used by the studied companies – a very interesting list: https://doi.org/10.5281/zenodo.3580893

However, what is interesting, and contradictory to what we have observed in our work, is the fact that the metrics which are focused on specific projects/products or universal, were not that popular. Instead, the metrics should be applicable for multiple products/projects, i.e. the type of the measured entity should contain more than one instance.

So, are the non-actionable metrics a complete waste of time then? Well, I would not say so. Neither do the authors. The non-actionable metrics can still be informative. They can be used to raise awareness of the issue, or simply provide the means for monitoring of the situation or the product, without the need to trigger specific decisions. Examples – product sales numbers, customer satisfaction, etc. Hard to act on them directly, but very important to collect and monitor.

Emerging new field – DUO – Data mining and optimization (EMSE article)

New sub-areas or fields within software engineering are not that common, but they come up once in a while. The authors of this article (https://doi-org.ezproxy.ub.gu.se/10.1007/s10664-020-09808-9, Better software analytics via “DUO”: Data mining algorithms using/used-by optimizers) argue that this is the case now.

In this article, the authors provide a view that data mining and building optimization models are done in tandem and that this is the new field. They show that the data mined from repositories influences optimization models and that the development of models influences data mining.

The authors make the following claims (quoted from the paper, references removed):

Claim1:For software engineering tasks, optimization and data mining are very similar. Hence, it is natural and simple to combine the two methods.
Claim2:For software engineering tasks. optimizers can greatly improve data miners. A data miner’s default tuners can lead to sub-optimal performance. Automatic optimizers can find tunings that dramatically improve that performance.
Claim3:For software engineering tasks, data miners can greatly improve optimization. If a data miner groups together related items, an optimizer can explore and report conclusions that are general across a set of solutions. Further, optimization for SE problems can be very slow. But if that optimization executes over the groupings found by a data miner, that inference can terminate orders of magnitude faster.
Claim4:For software engineering tasks, data mining without optimization is not recommended. Conclusions reached from an unoptimized data miner can be changed, sometimes even dramatically improved, by running the same tuned learner on the same data. Researchers in data mining should, therefore, consider adding an optimization step to their analysis.

These claims make a lot of sense and they are aligned with my observations. I recommend this article for everyone who is working at or developing a metric team or a data analysis/data science team.

Building interactive dashboards

Review of the book Interactive Visual Data Analysis: https://www.taylorfrancis.com/books/9781315152707 by Christian Tominski, Heidrun Schumann

Designing a good dashboard is an art. We need to answer questions like – who will use the dashboard? for what? when? and how will the interaction happen. Our team has studied dashboards and developed a model for choosing which one is the most suitable one ( https://gupea.ub.gu.se/bitstream/2077/41120/1/gupea_2077_41120_1.pdf ).

However, we never studied what is important when constructing a dashboard, but the authors of this book did.

I’m super happy to have stumbled upon this book, because it has shown me how to think when constructing dashboards. It shows which elements are important and how to create actionable visual analytics, not how to use JavaScript to create a diagram.

I liked chapter 3 and 4 the most. They show how to move from simple visualizations to interactions and how to work with parameters of the visualizations. However, one needs to read chapter 2 as well to get some inspiration about which diagrams to use and how to use the visual attentive attributes like color, size or shape – both separately and in combination.

To sum up, if you want to construct a good dashboard, please read the book and I can promise you, it will not dissapoint you.

How do you know if you are disrupted?

Or how do the banks know that…

Recently I’ve read a very interesting book about the disruption that happens in the banking sector. I’ve learnt that this is not the first book about the topic and I wanted to understand how things are actually working with AI and the banking sector.

We’ve published a paper a while back about the introduction to AI for bankers, but I wanted to know more about what has happened since then. The link to the paper is here: https://doi.org/10.2308/jeta-19-04-30-21

So, what’s new about this particular book? Or what’s new about the disruption?

The book talks a lot about the financial sector and how it evolved over the years. Bank 4.0 is a bank which is powered by AI and is able to reason about your economy at a higher abstraction level. Instead of asking “Siri, how much money do I have on my account?”, we can start asking questions like “Siri, can I affort buying the new Playstation 5?” and the assistant will answer that you can, but then you need to cut down on your vacation expenses and maybe replan the purchase of the new mobile phone (which is getting old).

A small disclaimer here: I use Siri as example, this has nothing to do with Apple services (at least nothing that I’m aware of).

Another interesting aspect of banking 4.0 is the fact that financial actors are more trustworthy than the banks. The young generation would rather interact with the actors like WeChat (https://metrics.blogg.gu.se/?p=407) than interact with the bank.

However, what I like best about the book is the fact that it problematizes the concepts related to disruption – for example how do you know that you are being disrupted? or when should a company start pivoting, or even whether pivoting or abandonment of the old model is possible in the company that is being disrupted.

Tools in Agile development

https://arxiv.org/pdf/2004.00289.pdf

Agile software development is here to stay. Software engineering has been changed completely when it arrived and I cannot see how it will change back to waterfall.

Even the so-called post-agile methodologies are built on the same premises – collaboration within teams, empowerement, taking responsibility. Since we educate our future software engineers to be self-organizing, to take responsibility for their topics, projects and personal development, we will live with see a development in this direction.

At the same time, the tools that are needed to support these teams evolve. When I started working as a software engineer, Rational ClearCase was the go-to tool for version control. Now, my students don’t even recognize it. Well, Rational does not exist any more either – the last time I checked it was bought by IBM, no one knows what happens now.

In this paper, the authors take a look at how modern collaboration patterns look like and which tools are used there. They have observed that consolidation of tools (e.g. towards Slack) introduced very positive dynamics to the team. However, they also found that some automated notifications (e.g. from Jenkins) got lost in these new tools, which was not appreciated.

This case study is a longitudinal one, which is a great example of how a research team builds up a long-term collaboration, so important for modern academia and industry!