SE metrics (Software Engineering) – Page 9 – Software engineering, metrics, functional safety …

Legacy code…

I stumbled across a great talk from Dylan Beattie about legacy code. It is a pre-pandemic talk, but it opens up with a great song and talks about legacy code differently than what we usually do.

There is a lot of great material and food for thought in this video, but I would like to turn your attention to minute 26, where Dylan talks about Excel and how the world runs on it.

He says that a lot of things are actually built on top of Excel because it is essentially a functional language of sorts. The software developed on top of Excel is also the software that is NOT written by professional programmers and software engineers. Yet, it is prevalent in modern society.

Don’t get me wrong. I am in favor of Excel. Love the tool and what Microsoft has done with it. It is so flexible that it can be used with almost all programming environments – from the built-in VBA (I know, ancient history), to Python or C#. We’ve done our share of Excel programming back in a day, e.g. designed measurement systems based on it: A framework for developing measurement systems and its industrial evaluation – ScienceDirect

I agree, the tool is not perfect, but it is installed on ALL office computers and can be executed by anybody. Just open up the file and run it. That’s why we chose it for the measurement systems. Well, at least until we had to do a big rewrite and go to SQL, dashboards, etc…

As I said – history.

Predicting defects on the line level, article review

IEEE Xplore Full-Text PDF:

A lot has been written about defect prediction, and I’m pretty sure that a lot will be written. It’s one of the research areas which is quite cool to work with because it provides researchers with quite quick results and is relatively quantitative in its nature.

One could also say that this is a holy grail in software development – to predict a location of a defect and fix it before it becomes a problem. It’s a good goal, but it is also a goal that is more like quicksand than a gravel road. Well, for one, not all defects are easy to recognize. Some are not even certain to be defects – sometimes it is not clear how to interpret a requirement, so it’s not easy to say if a piece of code is implementing it correctly or not.

In this paper, the authors have done a great job in creating a system to predict defect location on line-level – DeepLineDP. The requirements for the system are partially based on a survey conducted by the authors with developers.

According to the authors: “DeepLineDP is 14%-24% more accurate than other file-level defect prediction approaches; is 50%-250% more cost-effective than other line-level defect prediction approaches; and achieves a reasonable performance when transferred to other software projects. These findings confirm that the surrounding tokens and surrounding lines should be considered to identify the fine-grained locations of defective files (i.e., defective lines). “

I like this work and I recommend everyone interested in how to use deep learning for code tasks to look at this work.

Our team has done some of these investigations ourselves. You can watch them on Youtube here:

Post-Corona branding…

Post Corona: From Crisis to Opportunity: Galloway, Scott: 9780593332214: Amazon.com: Books

A good holiday reading is something that is an essential addition to the time spent with family and friends. Every year I try to get hold of a good book to get inspiration for the upcoming year. Last year, I’ve read “Grit”, which is about perseverance. This year, I noticed a book of NYU Stern professor Scott Galloway – “Post Corona: From Crisis to Opportunity”. Galloway is also the author of “The Four”, which is a book about Apple, Google, Facebook, and Amazon.

Now, to the topic of the day – the post-corona book. I’ve read this book a bit slower than I usually do (which is a good thing). When I read it I had Galloway’s voice in my head talking about the opportunities of large companies – a big tech, as he calls them. His thesis is that the pandemic actually accelerated their growth to the size which makes them really hard to disrupt. By mergers and acquisitions, they can “cannibalize” their competition, unless the competition is them.

I thought that this book would be about Zoom – a company unheard of before the pandemic, now a synonym for a phone call. I thought that the book would be about health services and telemedicine – another area that was small and now is big. Now, it was nothing like that. The book was about The Four and how they capitalize on their brands in times of pandemic.

There is a thesis out there, that if you are getting something for free, it’s not worth much. In this book, Galloway popularizes another thesis – if you are getting something for free, you are the product, not the consumer. He uses this as a way of explaining why Apple charges so much for their products – for not using our data, whereas Google and Facebook/Meta capitalize on our data. Apple connects 200 data points per day from us, while Google collects 2000 data points per hour – a small difference.

I’m not a privacy freak, but I do not want to be a product unless I choose to. I do not want companies to monetize on me, my behavior, and my family. But, and that’s a sad thing, I do want to have great services for a reasonable price. I want my maps to work well – the one in the car’s GPS simply does not make it. I want to watch short tutorials on YouTube – Netflix does not produce tutorials about variational autoencoders (yet).

To sum up, I like the thesis posed by Galloway, that the next big thing taken up by Amazon, will probably be medical insurance or schools. It is not difficult to see that the telemedicine model is essentially mature enough for being disrupted. I really recommend this book as food for thought in the post-pandemic (or endemic) world of 2022.

Test prioritization – a systematic review (review)

Test case selection and prioritization using machine learning: a systematic literature review (springer.com)

Testing is an important activity in every software engineering project. In professional organizations, the process is structured and well-organized. In smaller projects, start-up style organizations, or in research studies, the process is less organized.

There are different views on why we do testing. Some think that we do testing to find defects, some to prove that the software works correctly, finally some think that we do this to waste time (well, not so many maybe). In my experience it is the combination of the first and the second. We do testing to find defects and also to track how good our software gets over time (software reliability growth modelling).

This paper presents a systematic literature review on using machine learning to select and prioritize test cases. I think that the authors summarize their contribution in a very good way (quote):

The main ML techniques used for TSP are: supervised learning (ranking models), unsupervised learning (clustering), reinforcement learning, and natural language processing.
ML-based TSP techniques mainly rely on features that are easy to compute and based on data that are practical to collect in a CI context, including execution history, coverage information, code complexity, and textual data.
ML-based TSP techniques are evaluated using a variety of metrics that are, sometimes, calculated differently in TS and TP, making it difficult to compare their results. Most of the currently available subjects have extremely low failure rates, making them unsuitable for evaluating ML-based TSP techniques.
Comparing the performance of ML-based TSP techniques is challenging due to the variation of evaluation metrics, test suite sizes, and failure rates across studies. Reporting failure rates alongside performance values helps provide more interpretable results to the wider research community.
Only six out of the 29 selected studies (21%) can be considered reproducible, thus raising methodological issues in the studies and a lack of confidence in reported results.

I think the biggest surprise, for me, is that complexity-based metrics are still used widely in this context. I’m happy that there are new approaches on the rise, for example textual analyses. I guess there is a point in combining approaches, but complexity seems like a very coarse-grained instrument for this type of analysis. We know it correlates well with size, and the larger the test (or UUT), the higher the probability of triggering a failure.

Well, I guess I need to make more experiments myself to check if I miss something.

Merry X-mas and the next year with AI

Image by Peter Pieras from Pixabay

Sparse reward for reinforcement learning‐based continuous integration testing – Yang – – Journal of Software: Evolution and Process – Wiley Online Library

This is the last post that I want to write in 2021. The year has been hectic and full of surprises. First, we got the news that the vaccine works for Covid-19. We all prepared for normalization, for being able to travel, visit friends, families, and conferences in person.

Then came the new variants, like the Omikron, which seem to escape from the vaccine, and countries still are not ready for opening. Conferences get postponed, trips canceled. I hope this is just a temporary situation and that we will be able to get in control of the situation again.

For the last post in 2021, I chose one of the articles that I’ve recently read – about the use of reinforcement learning in integration testing. Kind of a different approach to what we do in the Software Center project.

This paper tackles the problem of sparse rewards for fitness functions when using reinforcement learning for test selection. It proposes a combination of historical data and a function that assigns a higher reward for non-sparse data. It looks like the work is very promising, as it has been tested on 14 different industrial data sets. I need to check if during the coming holidays. It’s a project to do for X-Mas

With that, I would like to thank all of you for being here with me during 2021 and hope that we can continue in 2022. Wish you all great holidays and the best of luck in the coming 2022!

From the abstract:

“Reinforcement learning (RL) has been used to optimize the continuous integration (CI) testing, where the reward plays a key role in directing the adjustment of the test case prioritization (TCP) strategy. In CI testing, the frequency of integration is usually very high, while the failure rate of test cases is low. Consequently, RL will get scarce rewards in CI testing, which may lead to low learning efficiency of RL and even difficulty in convergence. This paper introduces three rewards to tackle the issue of sparse rewards of RL in CI testing. First, the historical failure density-based reward (HFD) is defined, which objectively represents the sparse reward problem. Second, the average failure position-based reward (AFP) is proposed to increase the reward value and reduce the impact of sparse rewards. Furthermore, a technique based on additional reward is proposed, which extracts the test occurrence frequency of passed test cases for additional rewards. Empirical studies are conducted on 14 real industry data sets. The experiment results are promising, especially the reward with additional reward can improve NAPFD (Normalized Average Percentage of Faults Detected) by up to 21.97%, enhance Recall with a maximum of 21.87%, and increase TTF (Test to Fail) by an average of 9.99 positions. “

A Friday research and pedagogy reflection post…

It’s Friday again and I’m trying to pack things up for the weekend. While doing that I reflected a bit on the week that passed. It started with the meetings on research directions, but it ended in discussing and thinking about pedagogy.

At the beginning of the week, I focused on preparing for an evaluation of a tool, read about VAEs and the disentanglement problem as well as looked at the new datasets. It’s all cool and interesting and kind of on the edge. It is also in such a stage that it works mostly for the well-known and annotated datasets, while it works a bit worse on the datasets that come from real-life – e.g. from driving a car in the city, where there are tens of objects in the picture.

However, my week ended by talking about pedagogy. I’ve had a chance to listen to our excellent teachers at the University of Gothenburg and get their reflections on the year that passed. To be honest, I did not see that coming and I did not expect what I heard. Many positive things, but also a confirmation that we, as a university, focus too little on pedagogy and teaching. It’s the third time I get to reflect on this, so I need to do something about it.

Second, I also listened to and reflected upon, the challenges of Ph.D. students today. They need to publish in an increasingly higher tempo. As our discipline matures, the quality of publications increases and so do the requirements for the Ph.D. students. They also face an uncertain future as the research funding decreases, the number of positions decreases, and the tenure tracks positions are no longer “forever”.

There are also highlights of this week. We had a great discussion at one of our steering groups about the companies involved in our research (which is impressive). We also got a number of new research projects associated, we research results and, finally, the ALC (Active Learning Classroom) has been finished.

With that, my friends, I leave off for the weekend.

Noisy data, biased data – book review

Noise: A Flaw in Human Judgment: Kahneman, Daniel, Sibony, Olivier, Sunstein, Cass R.: 9780316451406: Amazon.com: Books

It’s been a while since I’ve written my last post. Well, hectic times I guess. Old friends leaving the spot, new friends entering the spot – a life of a researcher.

While working on my recent research projects, I was wondering about one thing – is there a correlation between noise in data and noise in judgement/decisions?

Let me explain the problem first. In a perfect world, in a galaxy far, far away, all data is perfect. All pictures are labelled correctly, natural language has a formal meaning and all data points are assigned to their classes perfectly. In this perfect world, the interpretation of the data is also unambiguous and independent of who does the interpretation. In that perfect world, this means that machines can take all decisions and we, as humans, can relax.

But, we do not live in that perfect world. In our world, there is data that is not always correct and the language is imprecise. We are also biased by many factors, as humans. In this world of ours, this means that a lof of things is a “judgement call”, which means that training a machine to take decisions is not always correct.

So, I was thinking, if we clean up the noise, will the decisions be unbiased? If we train the persons making decisions, will the decisions be more correct?

I’ve looked at one of the recent works of the Nobel Prize winner (Daniel Kahneman) and his colleagues. They describe what is noise and bias in terms of where they come from and how to find them. This book builds upon the principles of statistical error (and its measurement) as well as our ability to handle the error in terms of the ‘wisdom of the crowd’. It also shows how using more processes reduces bias and introduces order to the chaos of our galaxy.

I would like to leave you with this thought – we have the whole Agile software development movement, focused on humans and products, not processes. But if it is the processes that actually bring some order, aren’t we just introducing more chaos by being more Agile?

“That will never work” – A book about Netflix

That Will Never Work: The Birth of Netflix by the first CEO and co-founder Marc Randolph : Randolph, Marc: Amazon.se: Böcker

Building a successful start-up seems like a really cool idea – from a distance. I’ve used to teach a course about enterpreneurship, start-up, business models and alike. Although it was nice, I always felt that I’m a person who knows absolutely nothing about this. At least not in practice…

In this book, the original founder of Netflix tells his story about how he took the idea and made it into a product. He tells the story about how the idea hatched, how he, and his team, created a data-driven model of understanding their customers. The book is also about the struggles of start-ups – about taking on investments from the beginning and then being pushed out of the company. It’s about being able to understand what’s best for the company and what’s best for the individual.

I like the way in which the authors describes the story, and also shows a bit of himself: how he felt, how he wanted to build the company and how he decided when to leave (with grace!). I also like his ending of the book – Nobody knows anything! which is a saying that you never really knows what will and will not work in the end.

I recommend this as a Sunday reading to get inspired.

Is software architecture and code the same?

Relationships between software architecture and source code in practice: An exploratory survey and interview – ScienceDirect

Software architecting is one of the crucial activities for a success of your product. There is a BAPO model, there B stands for Business and A for Architecture – and there is a good reason why it is on the second place. It should not dictate your business model, but it should support it.

Well, it is also good that the architecture comes before processes and organization. If software is your product, then it should dictate how you work and how you are organized.

But, how about the software code? For many software programmers and designers, the architecture is a set of diagrams which show logical blocks and software organization, but they are not the ACTUAL code, not the product itself. In one of our research project we study exactly that kind of problem – how to ensure that we keep both aligned, or more accurately, how we can use machine learning to keep the code and architecture synchronized.

Note that I use the word synchronized, not aligned or updated. This is to avoid one of many misconceptions about software architectures — that they are set once and for all. Such an assumption is true for architectures of buildings, but not software. We are, and should be, more flexible than that.

In one of the latest Information and Software Technology issues, I found this interesting study. It is about how architects and programmers perceive software architectures. It shows how architectures evolve and why they are often outdated. It is a survey and I really like where it’s going. Strongly recommend to read if you are into software architectures, programming and the technical side of software engineering….

Cybersecurity, security, and safety…

Image by Robinraj Premchand from Pixabay

During the spring semester, my students did great work looking into the security of a car’s electrical system. They managed to decode signals, understand high-level data, and managed to perform small changes in the car’s function.

It all sounds great as thesis project. Both the students and the company loved this project. It was challenging, it was new, it was useful. But I’m not writing this post about that. I want to write about what has happened, or not happened, after that.

In the months that came after the thesis, I decided to look into mechanisms for how to design and implement secure software. Being a programmer at the bottom, I turned to GitHub for help. I search for tools and libraries for secure software design. I know, I could have searched for something different, but let’s start there.

The results were :

Analysis frameworks:

There were more of these, but most of the same kind. I was a bit amazed by the fact that there is so little outside of web design. I also looked at some of the research in this area (no systematic review, I promised myself not to do one). There I found all kinds of work, but mostly theoretical. The areas of interest:

Cryptography: how to encode/decode information, keys, passwords.
Secure software design: mostly analysis of vulnerabilities
Secure systems: mostly about passwords and vulnerabilities.
Privacy: how to keep the private information hidden from third parties (kind of security, but mostly something else – I’m still waiting to understand what).
Legacy operations: how to make the software long-lived and provide it with secure infrastructure.
Infrastructure: security of the cloud environments, end-to-end security.

Since I worked with software safety, I thought that it would be very similar. However, it was not. The safety community discussed, mostly, standardization, hazards, risks. Very little about code analysis, finding unsafe code, etc. So, mostly something different.

I’ll keep digging and I will run a few experiments with some of my students to understand what the technology could be. However, I’m not as optimistic as I was at the beginning of my search.