Miroslaw Staron – Page 15 – SE metrics (Software Engineering)

Actionable software metrics – an interesting new article

https://dl.acm.org/doi/pdf/10.1145/3383219.3383244

Working with metrics is a domain which calls for empirical data, which constantly changes. Software companies evolve and their metric programs evolve. I’ve always been interested how the metrics data is used in companies, especially in other geographical regions than the nordics. Although there are differences between companies in Sweden, Danmark or Finland, these companies are still more similar to each other than companies within the same domain in other parts of the world – and that’s perfectly fine.

In this paper, the authors captured my attention because they studied a few companies that were not on my radar before. The author have also found an interesting angle on the metrics work – what makes a metric actionable?

As it turns out, there are a few things that make the metrics actionable:

being practical
inform decision-making, and
exhibit data quality

These three characteristics are very important and agreed upon by most of the respondents. The authors also recognise the need of the metrics to be temporal, i.e. relevant for the information needs at hand.

What I also liked about the paper is that they provide the link to their data, which is the set of metrics used by the studied companies – a very interesting list: https://doi.org/10.5281/zenodo.3580893

However, what is interesting, and contradictory to what we have observed in our work, is the fact that the metrics which are focused on specific projects/products or universal, were not that popular. Instead, the metrics should be applicable for multiple products/projects, i.e. the type of the measured entity should contain more than one instance.

So, are the non-actionable metrics a complete waste of time then? Well, I would not say so. Neither do the authors. The non-actionable metrics can still be informative. They can be used to raise awareness of the issue, or simply provide the means for monitoring of the situation or the product, without the need to trigger specific decisions. Examples – product sales numbers, customer satisfaction, etc. Hard to act on them directly, but very important to collect and monitor.

What to discuss about Deep Learning? – an EMSE article review

A study about what the developers discuss regarding Deep Learning and whether this differs across different frameworks is an interesting summer discussion topic: https://doi-org.ezproxy.ub.gu.se/10.1007/s10664-020-09819-6

The paper reviews the comments of developers who comment and/or post questions about three deep learning frameworks: Theano, Tensorflow and PyTorch. I’ve got interested in the paper because I wanted to see whether the communities using these frameworks differ. Myself, I’ve got introduced to Tensorflow a while back and keps using it. Since I’m not an ML researcher, the framework does not really matter for me, but I still would like to know whether I should read upon some new framework during the summer.

The observations quoted from the abstract:

1) a wide range of topics that are discussed about the three deep learning frameworks on both platforms, and the most popular workflow stages are Model Training and Preliminary Preparation.

2) the topic distributions at the workflow level and topic category level on Tensorflow and PyTorch are always similar while the topic distribution pattern on Theano is quite different. In addition, the topic trends at the workflow level and topic category level of the three deep learning frameworks are quite different.

3) the topics at the workflow level show different trends across the two platforms. e.g., the trend of the Preliminary Preparation stage topic on Stack Overflow comes to be relatively stable after 2016, while the trend of it on GitHub shows a stronger upward trend after 2016.

It’s interesting that the topics are roughly the same, but I’m a bit surprised that the topics are mostly about the data management/machine learning and not the frameworks themselves. This means that applications win over development of the frameworks – at least at the moment.

Emerging new field – DUO – Data mining and optimization (EMSE article)

New sub-areas or fields within software engineering are not that common, but they come up once in a while. The authors of this article (https://doi-org.ezproxy.ub.gu.se/10.1007/s10664-020-09808-9, Better software analytics via “DUO”: Data mining algorithms using/used-by optimizers) argue that this is the case now.

In this article, the authors provide a view that data mining and building optimization models are done in tandem and that this is the new field. They show that the data mined from repositories influences optimization models and that the development of models influences data mining.

The authors make the following claims (quoted from the paper, references removed):

Claim1:For software engineering tasks, optimization and data mining are very similar. Hence, it is natural and simple to combine the two methods.
Claim2:For software engineering tasks. optimizers can greatly improve data miners. A data miner’s default tuners can lead to sub-optimal performance. Automatic optimizers can find tunings that dramatically improve that performance.
Claim3:For software engineering tasks, data miners can greatly improve optimization. If a data miner groups together related items, an optimizer can explore and report conclusions that are general across a set of solutions. Further, optimization for SE problems can be very slow. But if that optimization executes over the groupings found by a data miner, that inference can terminate orders of magnitude faster.
Claim4:For software engineering tasks, data mining without optimization is not recommended. Conclusions reached from an unoptimized data miner can be changed, sometimes even dramatically improved, by running the same tuned learner on the same data. Researchers in data mining should, therefore, consider adding an optimization step to their analysis.

These claims make a lot of sense and they are aligned with my observations. I recommend this article for everyone who is working at or developing a metric team or a data analysis/data science team.

Building interactive dashboards

Review of the book Interactive Visual Data Analysis: https://www.taylorfrancis.com/books/9781315152707 by Christian Tominski, Heidrun Schumann

Designing a good dashboard is an art. We need to answer questions like – who will use the dashboard? for what? when? and how will the interaction happen. Our team has studied dashboards and developed a model for choosing which one is the most suitable one ( https://gupea.ub.gu.se/bitstream/2077/41120/1/gupea_2077_41120_1.pdf ).

However, we never studied what is important when constructing a dashboard, but the authors of this book did.

I’m super happy to have stumbled upon this book, because it has shown me how to think when constructing dashboards. It shows which elements are important and how to create actionable visual analytics, not how to use JavaScript to create a diagram.

I liked chapter 3 and 4 the most. They show how to move from simple visualizations to interactions and how to work with parameters of the visualizations. However, one needs to read chapter 2 as well to get some inspiration about which diagrams to use and how to use the visual attentive attributes like color, size or shape – both separately and in combination.

To sum up, if you want to construct a good dashboard, please read the book and I can promise you, it will not dissapoint you.

How do you know if you are disrupted?

Or how do the banks know that…

Recently I’ve read a very interesting book about the disruption that happens in the banking sector. I’ve learnt that this is not the first book about the topic and I wanted to understand how things are actually working with AI and the banking sector.

We’ve published a paper a while back about the introduction to AI for bankers, but I wanted to know more about what has happened since then. The link to the paper is here: https://doi.org/10.2308/jeta-19-04-30-21

So, what’s new about this particular book? Or what’s new about the disruption?

The book talks a lot about the financial sector and how it evolved over the years. Bank 4.0 is a bank which is powered by AI and is able to reason about your economy at a higher abstraction level. Instead of asking “Siri, how much money do I have on my account?”, we can start asking questions like “Siri, can I affort buying the new Playstation 5?” and the assistant will answer that you can, but then you need to cut down on your vacation expenses and maybe replan the purchase of the new mobile phone (which is getting old).

A small disclaimer here: I use Siri as example, this has nothing to do with Apple services (at least nothing that I’m aware of).

Another interesting aspect of banking 4.0 is the fact that financial actors are more trustworthy than the banks. The young generation would rather interact with the actors like WeChat (https://metrics.blogg.gu.se/?p=407) than interact with the bank.

However, what I like best about the book is the fact that it problematizes the concepts related to disruption – for example how do you know that you are being disrupted? or when should a company start pivoting, or even whether pivoting or abandonment of the old model is possible in the company that is being disrupted.

Tools in Agile development

https://arxiv.org/pdf/2004.00289.pdf

Agile software development is here to stay. Software engineering has been changed completely when it arrived and I cannot see how it will change back to waterfall.

Even the so-called post-agile methodologies are built on the same premises – collaboration within teams, empowerement, taking responsibility. Since we educate our future software engineers to be self-organizing, to take responsibility for their topics, projects and personal development, we will live with see a development in this direction.

At the same time, the tools that are needed to support these teams evolve. When I started working as a software engineer, Rational ClearCase was the go-to tool for version control. Now, my students don’t even recognize it. Well, Rational does not exist any more either – the last time I checked it was bought by IBM, no one knows what happens now.

In this paper, the authors take a look at how modern collaboration patterns look like and which tools are used there. They have observed that consolidation of tools (e.g. towards Slack) introduced very positive dynamics to the team. However, they also found that some automated notifications (e.g. from Jenkins) got lost in these new tools, which was not appreciated.

This case study is a longitudinal one, which is a great example of how a research team builds up a long-term collaboration, so important for modern academia and industry!

Testing ML applications

Image by Gordon Johnson from Pixabay

Code: https://github.com/lawrence415610/Mtkeras

I’ve recently looked at the applications of different testing techniques for testing ML applications and got interested in the so called metamorphic testing. The idea is that we can check whether an output is within a specific range or set, which is called a metamorphic relation ( https://medium.com/trustableai/testing-ai-with-metamorphic-testing-61d690001f5c).

What is interesting about this paper is that it presents a framework for testing ML applications. I’ve not tried it yet, but I will as it seems very interesting to check how things work with this metamorphic testing and metamorphic relations. I’ve also interested in how to measure the quality of the software in this context.

https://www.researchgate.net/profile/Zhi_Quan_Zhou/publication/340487456_A_Testing_Tool_for_Machine_Learning_Applications/links/5e8c7ee14585150839c682b9/A-Testing-Tool-for-Machine-Learning-Applications.pdf

Succeeding with large scale measurement programs

This week we had the possibility to give a webinar about how to work with large scale measurement programs. The webinar was dedicated for everyone who works with software metrics and would like to get more impact from that work.

It is not so much about the numbers, it is about the impact and what the numbers mean. The webinar that we present, provides a good understanding of how to make this impact. Based on our experiences, we chose all one needs to know to implement a measurement program in few weeks rather than years.

The webinar has been recorded and is available at this link: https://www.youtube.com/watch?v=2ChaVT_3djE&feature=youtu.be

Recording from the webinar about how to succeed with measurement programs.

Engineers and scientists love to measure. We measure complexity of software, its performance, size and maintainability (just to name a few). We need these measurements in order to construct software, manager organizations or release high quality, high reliable products. However, there is a difference between measuring software aspects and using the measures in decision processes. In this talk, we present the concept of measurement program, measurement system, information quality and indicator-triggered decisions. We show what to consider when setting up measurement programs and provide a hints about the costs and benefits of having the program. We end the talk with presenting recent research results from Software Center, where we combine measurements and machine learning to speed-up software development.

More materials about this are available here:

Book describing our experiences: https://www.springer.com/gp/book/9783319918358
PhD course about software measurement: https://www.springer.com/gp/book/9783319918358
Our website: www.metrics.se
Website of our collaboration: www.software-center.se

A while back we gave a webinar with a similar title, where we focused on the questions concerning the measurement infrastructure, visualization and assessment of the measurement program. The ACM webinar is presented here:

Classifying code smells…

https://link-springer-com.ezproxy.ub.gu.se/article/10.1007%2Fs11219-020-09498-y

Code smells are quite interesting phenomena to study. They are not really defects, but they are not good code either. They exist, but people rarely want to admit to them. There is also no consensus to how much effort it takes to remove them (or even whether they should be removed or just avoided).

In this paper, the authors study whether it is possible to use ML to find code smells. It turns out it is possible and the accuracy is quite high (over 95%). It also shows that sometimes it is better to show a number of recommendations (e.g. two potential smells) rather than one – it requires less accuracy to make the recommendation, but helps the users to narrow-down their solution spaces.