December 2021 – SE metrics (Software Engineering)

Merry X-mas and the next year with AI

Image by Peter Pieras from Pixabay

Sparse reward for reinforcement learning‐based continuous integration testing – Yang – – Journal of Software: Evolution and Process – Wiley Online Library

This is the last post that I want to write in 2021. The year has been hectic and full of surprises. First, we got the news that the vaccine works for Covid-19. We all prepared for normalization, for being able to travel, visit friends, families, and conferences in person.

Then came the new variants, like the Omikron, which seem to escape from the vaccine, and countries still are not ready for opening. Conferences get postponed, trips canceled. I hope this is just a temporary situation and that we will be able to get in control of the situation again.

For the last post in 2021, I chose one of the articles that I’ve recently read – about the use of reinforcement learning in integration testing. Kind of a different approach to what we do in the Software Center project.

This paper tackles the problem of sparse rewards for fitness functions when using reinforcement learning for test selection. It proposes a combination of historical data and a function that assigns a higher reward for non-sparse data. It looks like the work is very promising, as it has been tested on 14 different industrial data sets. I need to check if during the coming holidays. It’s a project to do for X-Mas

With that, I would like to thank all of you for being here with me during 2021 and hope that we can continue in 2022. Wish you all great holidays and the best of luck in the coming 2022!

From the abstract:

“Reinforcement learning (RL) has been used to optimize the continuous integration (CI) testing, where the reward plays a key role in directing the adjustment of the test case prioritization (TCP) strategy. In CI testing, the frequency of integration is usually very high, while the failure rate of test cases is low. Consequently, RL will get scarce rewards in CI testing, which may lead to low learning efficiency of RL and even difficulty in convergence. This paper introduces three rewards to tackle the issue of sparse rewards of RL in CI testing. First, the historical failure density-based reward (HFD) is defined, which objectively represents the sparse reward problem. Second, the average failure position-based reward (AFP) is proposed to increase the reward value and reduce the impact of sparse rewards. Furthermore, a technique based on additional reward is proposed, which extracts the test occurrence frequency of passed test cases for additional rewards. Empirical studies are conducted on 14 real industry data sets. The experiment results are promising, especially the reward with additional reward can improve NAPFD (Normalized Average Percentage of Faults Detected) by up to 21.97%, enhance Recall with a maximum of 21.87%, and increase TTF (Test to Fail) by an average of 9.99 positions. “

A Friday research and pedagogy reflection post…

It’s Friday again and I’m trying to pack things up for the weekend. While doing that I reflected a bit on the week that passed. It started with the meetings on research directions, but it ended in discussing and thinking about pedagogy.

At the beginning of the week, I focused on preparing for an evaluation of a tool, read about VAEs and the disentanglement problem as well as looked at the new datasets. It’s all cool and interesting and kind of on the edge. It is also in such a stage that it works mostly for the well-known and annotated datasets, while it works a bit worse on the datasets that come from real-life – e.g. from driving a car in the city, where there are tens of objects in the picture.

However, my week ended by talking about pedagogy. I’ve had a chance to listen to our excellent teachers at the University of Gothenburg and get their reflections on the year that passed. To be honest, I did not see that coming and I did not expect what I heard. Many positive things, but also a confirmation that we, as a university, focus too little on pedagogy and teaching. It’s the third time I get to reflect on this, so I need to do something about it.

Second, I also listened to and reflected upon, the challenges of Ph.D. students today. They need to publish in an increasingly higher tempo. As our discipline matures, the quality of publications increases and so do the requirements for the Ph.D. students. They also face an uncertain future as the research funding decreases, the number of positions decreases, and the tenure tracks positions are no longer “forever”.

There are also highlights of this week. We had a great discussion at one of our steering groups about the companies involved in our research (which is impressive). We also got a number of new research projects associated, we research results and, finally, the ALC (Active Learning Classroom) has been finished.

With that, my friends, I leave off for the weekend.

Noisy data, biased data – book review

Noise: A Flaw in Human Judgment: Kahneman, Daniel, Sibony, Olivier, Sunstein, Cass R.: 9780316451406: Amazon.com: Books

It’s been a while since I’ve written my last post. Well, hectic times I guess. Old friends leaving the spot, new friends entering the spot – a life of a researcher.

While working on my recent research projects, I was wondering about one thing – is there a correlation between noise in data and noise in judgement/decisions?

Let me explain the problem first. In a perfect world, in a galaxy far, far away, all data is perfect. All pictures are labelled correctly, natural language has a formal meaning and all data points are assigned to their classes perfectly. In this perfect world, the interpretation of the data is also unambiguous and independent of who does the interpretation. In that perfect world, this means that machines can take all decisions and we, as humans, can relax.

But, we do not live in that perfect world. In our world, there is data that is not always correct and the language is imprecise. We are also biased by many factors, as humans. In this world of ours, this means that a lof of things is a “judgement call”, which means that training a machine to take decisions is not always correct.

So, I was thinking, if we clean up the noise, will the decisions be unbiased? If we train the persons making decisions, will the decisions be more correct?

I’ve looked at one of the recent works of the Nobel Prize winner (Daniel Kahneman) and his colleagues. They describe what is noise and bias in terms of where they come from and how to find them. This book builds upon the principles of statistical error (and its measurement) as well as our ability to handle the error in terms of the ‘wisdom of the crowd’. It also shows how using more processes reduces bias and introduces order to the chaos of our galaxy.

I would like to leave you with this thought – we have the whole Agile software development movement, focused on humans and products, not processes. But if it is the processes that actually bring some order, aren’t we just introducing more chaos by being more Agile?