software engineering – Page 6 – SE metrics (Software Engineering)

Legacy code…

I stumbled across a great talk from Dylan Beattie about legacy code. It is a pre-pandemic talk, but it opens up with a great song and talks about legacy code differently than what we usually do.

There is a lot of great material and food for thought in this video, but I would like to turn your attention to minute 26, where Dylan talks about Excel and how the world runs on it.

He says that a lot of things are actually built on top of Excel because it is essentially a functional language of sorts. The software developed on top of Excel is also the software that is NOT written by professional programmers and software engineers. Yet, it is prevalent in modern society.

Don’t get me wrong. I am in favor of Excel. Love the tool and what Microsoft has done with it. It is so flexible that it can be used with almost all programming environments – from the built-in VBA (I know, ancient history), to Python or C#. We’ve done our share of Excel programming back in a day, e.g. designed measurement systems based on it: A framework for developing measurement systems and its industrial evaluation – ScienceDirect

I agree, the tool is not perfect, but it is installed on ALL office computers and can be executed by anybody. Just open up the file and run it. That’s why we chose it for the measurement systems. Well, at least until we had to do a big rewrite and go to SQL, dashboards, etc…

As I said – history.

Predicting defects on the line level, article review

IEEE Xplore Full-Text PDF:

A lot has been written about defect prediction, and I’m pretty sure that a lot will be written. It’s one of the research areas which is quite cool to work with because it provides researchers with quite quick results and is relatively quantitative in its nature.

One could also say that this is a holy grail in software development – to predict a location of a defect and fix it before it becomes a problem. It’s a good goal, but it is also a goal that is more like quicksand than a gravel road. Well, for one, not all defects are easy to recognize. Some are not even certain to be defects – sometimes it is not clear how to interpret a requirement, so it’s not easy to say if a piece of code is implementing it correctly or not.

In this paper, the authors have done a great job in creating a system to predict defect location on line-level – DeepLineDP. The requirements for the system are partially based on a survey conducted by the authors with developers.

According to the authors: “DeepLineDP is 14%-24% more accurate than other file-level defect prediction approaches; is 50%-250% more cost-effective than other line-level defect prediction approaches; and achieves a reasonable performance when transferred to other software projects. These findings confirm that the surrounding tokens and surrounding lines should be considered to identify the fine-grained locations of defective files (i.e., defective lines). “

I like this work and I recommend everyone interested in how to use deep learning for code tasks to look at this work.

Our team has done some of these investigations ourselves. You can watch them on Youtube here:

Test prioritization – a systematic review (review)

Test case selection and prioritization using machine learning: a systematic literature review (springer.com)

Testing is an important activity in every software engineering project. In professional organizations, the process is structured and well-organized. In smaller projects, start-up style organizations, or in research studies, the process is less organized.

There are different views on why we do testing. Some think that we do testing to find defects, some to prove that the software works correctly, finally some think that we do this to waste time (well, not so many maybe). In my experience it is the combination of the first and the second. We do testing to find defects and also to track how good our software gets over time (software reliability growth modelling).

This paper presents a systematic literature review on using machine learning to select and prioritize test cases. I think that the authors summarize their contribution in a very good way (quote):

The main ML techniques used for TSP are: supervised learning (ranking models), unsupervised learning (clustering), reinforcement learning, and natural language processing.
ML-based TSP techniques mainly rely on features that are easy to compute and based on data that are practical to collect in a CI context, including execution history, coverage information, code complexity, and textual data.
ML-based TSP techniques are evaluated using a variety of metrics that are, sometimes, calculated differently in TS and TP, making it difficult to compare their results. Most of the currently available subjects have extremely low failure rates, making them unsuitable for evaluating ML-based TSP techniques.
Comparing the performance of ML-based TSP techniques is challenging due to the variation of evaluation metrics, test suite sizes, and failure rates across studies. Reporting failure rates alongside performance values helps provide more interpretable results to the wider research community.
Only six out of the 29 selected studies (21%) can be considered reproducible, thus raising methodological issues in the studies and a lack of confidence in reported results.

I think the biggest surprise, for me, is that complexity-based metrics are still used widely in this context. I’m happy that there are new approaches on the rise, for example textual analyses. I guess there is a point in combining approaches, but complexity seems like a very coarse-grained instrument for this type of analysis. We know it correlates well with size, and the larger the test (or UUT), the higher the probability of triggering a failure.

Well, I guess I need to make more experiments myself to check if I miss something.

A Friday research and pedagogy reflection post…

It’s Friday again and I’m trying to pack things up for the weekend. While doing that I reflected a bit on the week that passed. It started with the meetings on research directions, but it ended in discussing and thinking about pedagogy.

At the beginning of the week, I focused on preparing for an evaluation of a tool, read about VAEs and the disentanglement problem as well as looked at the new datasets. It’s all cool and interesting and kind of on the edge. It is also in such a stage that it works mostly for the well-known and annotated datasets, while it works a bit worse on the datasets that come from real-life – e.g. from driving a car in the city, where there are tens of objects in the picture.

However, my week ended by talking about pedagogy. I’ve had a chance to listen to our excellent teachers at the University of Gothenburg and get their reflections on the year that passed. To be honest, I did not see that coming and I did not expect what I heard. Many positive things, but also a confirmation that we, as a university, focus too little on pedagogy and teaching. It’s the third time I get to reflect on this, so I need to do something about it.

Second, I also listened to and reflected upon, the challenges of Ph.D. students today. They need to publish in an increasingly higher tempo. As our discipline matures, the quality of publications increases and so do the requirements for the Ph.D. students. They also face an uncertain future as the research funding decreases, the number of positions decreases, and the tenure tracks positions are no longer “forever”.

There are also highlights of this week. We had a great discussion at one of our steering groups about the companies involved in our research (which is impressive). We also got a number of new research projects associated, we research results and, finally, the ALC (Active Learning Classroom) has been finished.

With that, my friends, I leave off for the weekend.

Noisy data, biased data – book review

Noise: A Flaw in Human Judgment: Kahneman, Daniel, Sibony, Olivier, Sunstein, Cass R.: 9780316451406: Amazon.com: Books

It’s been a while since I’ve written my last post. Well, hectic times I guess. Old friends leaving the spot, new friends entering the spot – a life of a researcher.

While working on my recent research projects, I was wondering about one thing – is there a correlation between noise in data and noise in judgement/decisions?

Let me explain the problem first. In a perfect world, in a galaxy far, far away, all data is perfect. All pictures are labelled correctly, natural language has a formal meaning and all data points are assigned to their classes perfectly. In this perfect world, the interpretation of the data is also unambiguous and independent of who does the interpretation. In that perfect world, this means that machines can take all decisions and we, as humans, can relax.

But, we do not live in that perfect world. In our world, there is data that is not always correct and the language is imprecise. We are also biased by many factors, as humans. In this world of ours, this means that a lof of things is a “judgement call”, which means that training a machine to take decisions is not always correct.

So, I was thinking, if we clean up the noise, will the decisions be unbiased? If we train the persons making decisions, will the decisions be more correct?

I’ve looked at one of the recent works of the Nobel Prize winner (Daniel Kahneman) and his colleagues. They describe what is noise and bias in terms of where they come from and how to find them. This book builds upon the principles of statistical error (and its measurement) as well as our ability to handle the error in terms of the ‘wisdom of the crowd’. It also shows how using more processes reduces bias and introduces order to the chaos of our galaxy.

I would like to leave you with this thought – we have the whole Agile software development movement, focused on humans and products, not processes. But if it is the processes that actually bring some order, aren’t we just introducing more chaos by being more Agile?

Is software architecture and code the same?

Relationships between software architecture and source code in practice: An exploratory survey and interview – ScienceDirect

Software architecting is one of the crucial activities for a success of your product. There is a BAPO model, there B stands for Business and A for Architecture – and there is a good reason why it is on the second place. It should not dictate your business model, but it should support it.

Well, it is also good that the architecture comes before processes and organization. If software is your product, then it should dictate how you work and how you are organized.

But, how about the software code? For many software programmers and designers, the architecture is a set of diagrams which show logical blocks and software organization, but they are not the ACTUAL code, not the product itself. In one of our research project we study exactly that kind of problem – how to ensure that we keep both aligned, or more accurately, how we can use machine learning to keep the code and architecture synchronized.

Note that I use the word synchronized, not aligned or updated. This is to avoid one of many misconceptions about software architectures — that they are set once and for all. Such an assumption is true for architectures of buildings, but not software. We are, and should be, more flexible than that.

In one of the latest Information and Software Technology issues, I found this interesting study. It is about how architects and programmers perceive software architectures. It shows how architectures evolve and why they are often outdated. It is a survey and I really like where it’s going. Strongly recommend to read if you are into software architectures, programming and the technical side of software engineering….

Open or close – how we can leverage innovation through collaboration (book review)

Open: The Story of Human Progress : Norberg, Johan: Amazon.se: Böcker

Progress and innovation are very important for the development of our societies. Software engineers are focused on the progress in technology, software, frameworks, and the ways to develop software.

This book is about openness and closeness in modern society. It is a story showing how we benefit from being open and collaborative. I could not stop myself from making parallels to the original work about open software – “The Cathedral and The Bazaar” by Eric Raymond. Although a bit dated, the book opened my view on the open source movement.

We take for granted that we have Linux, GitHub, StackOverflow and all other tools for open collaboration, but it wasn’t always like that. The world used to be full of proprietary software and software engineers were people who turned requirements into products. It was the mighty business analysts who provided the requirements.

Well, we know that this does not work like that. Software engineers are often working on product – they take ownership of these products, they feel proud to create them. It turns out that the openness is the way to go here – when software engineers share code, they feel that they contribute to something bigger. When they keep the code to themselves, … well, I do not know what they feel. I like to create OSS products, docker containers and distribute them. Kind of feels better that way!

Guiding the selection of research methodologies (article highlight)

Guiding the selection of research methodology in industry–academia collaboration in software engineering – ScienceDirect

Research methodology is something that we must follow when conducting research studies. Without a research methodology, we just search for something and if we find it, we do not know if this finding is universal, true, or even if it really exists…

In my early works, I got really interested in empirical software engineering, in particular in experimentation. One of the authors of this article was one of my supervisors and I fell for his way of understanding and describing software engineering – as an applied area of research.

Over time, I realized that experimentation is great, but it is still not 100% what I wanted. I understood that I would like to see more collaboration with software engineers in the industry, those who make their living by programming, architecting, testing, modifying the code. I did a study at one of the vehicle manufacturers in Sweden, where I studied the complexity of the entire car project. There I understood that software engineering needs to be studies and practices in the industry. Academia is the place where we shape young minds, where we can gather multiple companies to share their experiences, and where we can make findings from individual cases into universal laws.

In this article, the authors discuss research methodologies applicable for industrial, or industry-close research. They discuss even one of the technology transfer models as a way of research co-production and co-validation.

The authors conclude this great overview in the following way (from the conclusions):

When it comes to differences, the three methodologies differ in their primary objective: DSM on acquiring design knowledge through the design of artifacts, AR on change in socio-technical systems, and TTRM on the transfer of research to industry. The primary objective of one methodology may be a secondary objective in another. Thus, the differences between them are more in their focus than in which activities they include.

In our analysis and comparison of their feasibility for industry–academia collaboration in software engineering research, the selection depends on the primary objective and scope of the research (RQ3). We, therefore, advice researchers to consider the objectives of their software engineering research endeavor and select an appropriate methodological frame accordingly. Furthermore, we recommend studying different sources of information concerning, in particular, the chosen research methodology to better understand the methodology before using it when conducting industry–academia collaborative research.

I will include this article as mandatory reading in my AR Ph.D. course in the future.

The speed of light and laws of software engineering…

While on vacation, I managed to watch a number of sci-fi movies, which I wanted to watch but did not have the time during the academic year. This got me thinking about certain laws of physics and the laws of software engineering. I think there are many similarities, and let me start by considering the speed of light, as a starter.

First of all, what we know about the speed of light is that it’s the fastest we know and that Einstein’s theory of relativity says that we cannot travel faster than the speed of light. Even if we were, such a speed is tremendously difficult.

How would we know where we travel if we cannot see where we travel (we go as fast as we can see, literally). So, the travel would be very fast and, probably, very short.
If we, somehow, see where we go, or detect an obstacle (e.g. by knowing its predicted position based on prior observations), how can we steer? We’re going so fast, that for the majority of the travel time, we would be going in straight lines. These straight lines would be similar to either the hyperjumps from Star Wars or the clicks from the Guardians of the Galaxy.
If we’re literally the fastest objects, how can others see us and avoid is? Is it possible to avoid the light? No, it’s not possible. This means that we would be super-chaotic.

Therefore, I think that, even if we could, traveling at the speed of light is probably not the best idea. At least not conceptually, which may change as we change.

How does it affect the laws of software engineering then. Well, I would start with the laws of complexity.

For a while now, I’ve been broadcasting the opinion that the complexity of software cannot be reduced, it can only be hidden. For complex problems, we need complex software and complex software cannot be simple. If our algorithms have many conditions, we cannot take them away, we can hide them in functions, but never get rid of them. That’s the first parallel – we cannot travel faster than the speed of light.

We can hide complexity, and thus make the program/software easier to understand and maintain, but the better we hide it, the harder it is to avoid/predict complexity. Packaging complex algorithms in simple blocks will make it difficult to make modifications. Actually, not to make the modifications, but to overview the consequences of these modifications.

If we simplify the program/algorithm too much, we need to expect that it’s going to provide erroneous results for some cases – again, complex problems require complex programs. An example of such issue is approximating continuous functions – since our computers are discrete, there is always some degree of error in such an approximation.

Finally, interconnectivity and modularity as a means of handling complexity have their limits. I do not think we can develop increasingly complex software by increasing its size. I believe it’s going to be difficult in the long run. We need to make sure that we have the competence to handle complexity and we need to be able to make the complexity apparent.

autoML – let’s talk about it…

AutoML, a promise of green pastures, less work, optimal results. So, it is like that? In this post I share my view on this and experience from running the first test using that model.

First of all, let’s be honest, there is not such thing as a free lunch. In case of autoML (auto-sklearn), the price tag comes first with the effort, skills and time to install it and make it work. The second is the performance…. It’s painfully slow compared to your own models, simply because it tests a lot of models here and there. It also take a lot of time to download and to make it work.

But, first thing first, let me tell you where I start. So, I used the data from the MicroHRV project ( 3. MicroHRV: Recognizing Rare Events in Microwave Radio Links and Intensive Care Units using Machine Learning – Software Center (software-center.se)). The data is from patients being operated to remove clots of blood from the brain (although dangerous it may sound, the actual procedure is planned and calm). I wanted to check whether autoML can do better compared to what we have at the moment.

What we have at the moment (for that particular dataset) is: Accuracy: 0.98, Precision: 0.98, Recall: 0.98 – using Random Forest classifier. So, this is actually already very good. For the medical domain, that’s actually in class of its own, given our previous studies ended up with ca. 0.7 in accuracy at best.

When it comes to installing autoML – if you like stackoverflow, downgrading, upgrading, compiling, etc. and run Windows 10, then it’s your heaven. If you run Linux – no problems. Otherwise – stick to manual analyses:)

After two days (and nights) of trying, the best configuration was:

WSL – Windows Subsystem for Linux
Ubuntu 20, and
countless of oss libraries

It takes a while to get it to work, the question is whether the results are good enough…

After three hours of waiting, a lot of heat from my laptop, over 1,000 models tested resulted in Accuracy: 0.91, Precision: 0.94, Recall: 0.91

So, worse than my manual selection of models. I include the confusion matrices.

The matrices are not that different, as the validation sets are not that large either. However, it seems that the RF is still better than the best model from autoML.

I need work more on that and see if I do something wrong. However, I take this as a success – I’m better than autoML (still some use of an old professor) – instead of a let-down of not getting better results.

By the end of the day, 0.98 in accuracy is still very good!