The speed of light and laws of software engineering…

Image by ParallelVision from Pixabay

While on vacation, I managed to watch a number of sci-fi movies, which I wanted to watch but did not have the time during the academic year. This got me thinking about certain laws of physics and the laws of software engineering. I think there are many similarities, and let me start by considering the speed of light, as a starter.

First of all, what we know about the speed of light is that it’s the fastest we know and that Einstein’s theory of relativity says that we cannot travel faster than the speed of light. Even if we were, such a speed is tremendously difficult.

  • How would we know where we travel if we cannot see where we travel (we go as fast as we can see, literally). So, the travel would be very fast and, probably, very short.
  • If we, somehow, see where we go, or detect an obstacle (e.g. by knowing its predicted position based on prior observations), how can we steer? We’re going so fast, that for the majority of the travel time, we would be going in straight lines. These straight lines would be similar to either the hyperjumps from Star Wars or the clicks from the Guardians of the Galaxy.
  • If we’re literally the fastest objects, how can others see us and avoid is? Is it possible to avoid the light? No, it’s not possible. This means that we would be super-chaotic.

Therefore, I think that, even if we could, traveling at the speed of light is probably not the best idea. At least not conceptually, which may change as we change.

How does it affect the laws of software engineering then. Well, I would start with the laws of complexity.

For a while now, I’ve been broadcasting the opinion that the complexity of software cannot be reduced, it can only be hidden. For complex problems, we need complex software and complex software cannot be simple. If our algorithms have many conditions, we cannot take them away, we can hide them in functions, but never get rid of them. That’s the first parallel – we cannot travel faster than the speed of light.

We can hide complexity, and thus make the program/software easier to understand and maintain, but the better we hide it, the harder it is to avoid/predict complexity. Packaging complex algorithms in simple blocks will make it difficult to make modifications. Actually, not to make the modifications, but to overview the consequences of these modifications.

If we simplify the program/algorithm too much, we need to expect that it’s going to provide erroneous results for some cases – again, complex problems require complex programs. An example of such issue is approximating continuous functions – since our computers are discrete, there is always some degree of error in such an approximation.

Finally, interconnectivity and modularity as a means of handling complexity have their limits. I do not think we can develop increasingly complex software by increasing its size. I believe it’s going to be difficult in the long run. We need to make sure that we have the competence to handle complexity and we need to be able to make the complexity apparent.

autoML – let’s talk about it…

Image from Pixabay

AutoML, a promise of green pastures, less work, optimal results. So, it is like that? In this post I share my view on this and experience from running the first test using that model.

First of all, let’s be honest, there is not such thing as a free lunch. In case of autoML (auto-sklearn), the price tag comes first with the effort, skills and time to install it and make it work. The second is the performance…. It’s painfully slow compared to your own models, simply because it tests a lot of models here and there. It also take a lot of time to download and to make it work.

But, first thing first, let me tell you where I start. So, I used the data from the MicroHRV project ( 3. MicroHRV: Recognizing Rare Events in Microwave Radio Links and Intensive Care Units using Machine Learning – Software Center (software-center.se)). The data is from patients being operated to remove clots of blood from the brain (although dangerous it may sound, the actual procedure is planned and calm). I wanted to check whether autoML can do better compared to what we have at the moment.

What we have at the moment (for that particular dataset) is: Accuracy: 0.98, Precision: 0.98, Recall: 0.98 – using Random Forest classifier. So, this is actually already very good. For the medical domain, that’s actually in class of its own, given our previous studies ended up with ca. 0.7 in accuracy at best.

When it comes to installing autoML – if you like stackoverflow, downgrading, upgrading, compiling, etc. and run Windows 10, then it’s your heaven. If you run Linux – no problems. Otherwise – stick to manual analyses:)

After two days (and nights) of trying, the best configuration was:

  • WSL – Windows Subsystem for Linux
  • Ubuntu 20, and
  • countless of oss libraries

It takes a while to get it to work, the question is whether the results are good enough…

After three hours of waiting, a lot of heat from my laptop, over 1,000 models tested resulted in Accuracy: 0.91, Precision: 0.94, Recall: 0.91

So, worse than my manual selection of models. I include the confusion matrices.

AutoML
Random forest

The matrices are not that different, as the validation sets are not that large either. However, it seems that the RF is still better than the best model from autoML.

I need work more on that and see if I do something wrong. However, I take this as a success – I’m better than autoML (still some use of an old professor) – instead of a let-down of not getting better results.

By the end of the day, 0.98 in accuracy is still very good!

Reproducing AI models – a guideline

Image by Pete Linforth from Pixabay

2107.00821.pdf (arxiv.org)

Machine learning has been used in software engineering as a great tool for both research and development. The fact that we have access to TensorFlow, PyCharm, and other toolkits, provides almost endless possibilities. Combine that with the hundreds (if not thousands) of datasets from Zenodo and Co. and you can train a model for almost anything.

So far, so good, I would say. Problems (yes, there are always some problems) appear when we want to reproduce the results of others. Training a model on your own dataset and making it available is easy. Trusting such a model in a new context is not.

Imagine an example of an ML model trained on data from Company X. We have probably tuned the parameters a lot, so the model works great there, but does it work for Company Y? Most probably it will not. Well, it will work, but the performance of the predictions are not going to be great.

So, Google has partner up with academic partners to set up SIGMODELS, and TensorFlow garden, initiatives that are aimed at making ML models more portable, experiments more replicable, and all the other goodies.

In this paper, the authors provide a set of checks, which we can use to make the models more transparent, which is the first step towards reproducibility. In these guidelines, the authors advocate for reporting the models architecture, their input and output structure, building blocks, loss functions, etc.

Naturally, they also recommend to report metrics which were used to optimize the models, e.g. accuracy, F1-score, MCC or others. I know, these are probably essentials, but you would be surprised to see that many authors do not really report these metrics. If they are omitted, then how do we know if the metrics were just so poor that the authors omitted them (low performance of the model) or that they are not relevant (low relevance of the metrics – which is a good thing).

For now, these guidelines are only a draft, but I hope that they will become more mainstream. just like the emprical guidelines from ACM (GitHub – acmsigsoft/EmpiricalStandards: Empirical standards for conducting and evaluating research in software engineering).

New from ML?

I seldom write about films and events, well maybe actually never, but this year, a lot has happened in the online way.

What’s new in Machine Learning | Keynote – YouTube

The video above is about the news from Google about their TensorFlow library, which include new ways of training models, compression and performance tuning and more.

TensorFlow Light and TensorFlow JS allow us to use the same models as for desktops, but on mobile devices. Really impressive. I’ve caught myself thinking whether I’m more impressed by the hardware capabilities of small devices, or the capabilities of software. Either way – super cool.

Google is not the only company announcing something. NVidia is also showing a lot of cool features for enterprises. Cloud access for rapid prototyping, model testing and deployments are in the center of that.

NVIDIA Executive Keynote for Enterprise AI at COMPUTEX 2021 – YouTube

I like gaming, so this is impressing, but even more impressive is to look at the last-year’s DLSS technology, which still cannot be beaten by the competition. Really nice.

New programming tools?

Glinda: Supporting Data Science with Live Programming, GUIs and a Domain-specific Language (acm.org)

I’m not going to add a picture here, because the actual paper contains a great picture, which is copyrighted. But, do we need another tool (I though), and if you think like that… well, think again.

Once I looked at the paper, I really liked the idea. This is a tool that combines the programming tasks of software engineers and such tasks like data exploration, labelling or cleaning. It’s a kind of tool like Jupyter Notebook, but it allows to interact with the data in a deeper way.

I strongly recommend to take a look at the tool. I’ve done a quick check and it looks really nice.

What makes a great code maintainer…

BIld av Rudy and Peter Skitterians från Pixabay



ICSE2021_B.pdf (igor.pro.br)

For many of us, software engineering is the possibility to create new projects, new products and cool services. We do that often, but we equally often forget about the maintenance. Well, maybe not forget, but we deliverately do not want to remember about it. It’s natural, as maintaining old code is not really anything interesting.

When reading this paper, I’ve realized that my view about the maintenance is a bit old. In my time in industry, maintainance was “bug-fixing” mostly. Today, this is more about community work. As the abstract of this paper says: “Although Open Source Software (OSS) maintainers devote a significant proportion of their work to coding tasks, great maintainers must excel in many other activities beyond coding. Maintainers should care about fostering a community, helping new members to find their place, while also saying “no” to patches that although are well-coded and well-tested, do not contribute to the goal of the project.”

This paper conducts a series of interviews with software maintainers. In short, their results are that great software maintainers are:

  • Available (response time),
  • Disciplined (follows the process),
  • Has a global view of what to achieve with the review,
  • Communicative,
  • Emapthetic,
  • Community building,
  • Technically excellent,
  • Quality aware,
  • Has domain experience,
  • Motivated,
  • Open minded,
  • Patient,
  • Diligent, and
  • Responsible

It’s a long list and the priority of each of these characteristics differs from one reviewer to another. However, it’s important that we see software maintainer as a social person who can contribute to the community rather than just sit in the dark office and reads code all day long. The maintainers are really the persons who make the software engineering groups work well.

After reading the paper, I’m more motivated to maintain the community of my students!

Data labelling – activity that makes people hate ML….

Image by S. Hermann & F. Richter from Pixabay

Data Labeling: An Empirical Investigation into Industrial Challenges and Mitigation Strategies | SpringerLink

Machine learning is hungry for data. The more you have, the happier it will be. Seems very easy when we learn how to program ML and how it works – there is plenty of open data sources to practice and learn from.

However, when we want to use ML for our purposes, things get a bit more complicated. There is a lot of data, but not in the right format. The one that is in the right format is incomplete. The one that is complete, is noisy. The one that is not noisy is too little. We need to collect more. And so the story goes on, and on, and on….

Collecting the data is not that problematic, as it can often be automated. At least in software engineering, automotive, telecon, transport/logistic and medicine. These are the ones I know, anyways. What is problematic, though is data labelling. It is the activity where we take each data point and add a class to it, or its label if we speak machine-learnish. The person doing the labelling needs to be competent to be able to label the data correctly – he/she needs to know the domain, know the data, know the context. Then, this person also needs to have a fantastic memory, because the labels need to be consistent. They also need to be unambiguous given the underlying feature vector.

In this paper, colleagues from our department study the process of data labelling and its challenges.

They find the following to be selected examples of challenges:

  • Lack of a systematic approach to labeling data for specific features
  • Unclear responsibility for labeling
  • Noisy labels
  • Difficulty to find a correlation between labels and features
  • Skewed label distributions
  • Time dependence
  • Difficulty to predict future uses for datasets

I think it’s a great work and reading for everyone who wants to get into ML for real, start using it at a company and understand whether it’s actually gives any benefit.

From the abstract: Labeling is a cornerstone of supervised machine learning. However, in industrial applications, data is often not labeled, which complicates using this data for machine learning. Although there are well-established labeling techniques such as crowdsourcing, active learning, and semi-supervised learning, these still do not provide accurate and reliable labels for every machine learning use case in the industry. In this context, the industry still relies heavily on manually annotating and labeling their data. This study investigates the challenges that companies experience when annotating and labeling their data. We performed a case study using a semi-structured interview with data scientists at two companies to explore their problems when labeling and annotating their data. This paper provides two contributions. We identify industry challenges in the labeling process, and then we propose mitigation strategies for these challenges.

Action research in open source – a nice article

Image by Amílcar Vanden-Bouch from Pixabay

https://link.springer.com/article/10.1007/s10664-020-09849-0?utm_source=toc&utm_medium=email&utm_campaign=toc_10664_25_5&utm_content=etoc_springer_20200904

I’ve been looking out for good examples of articles about action research in software engineering for a while. There is a lot of those coming from the participatory design community and ethnography in software engineering.

This paper is an example of how one can conduct action research together with an open source community. It shows how to conduct the research while being part of the community and adds a new angle on the topic – how do we democratize the research design. In contrast to company-based development, an open source community is free to accept the new ways of working or not. Therefore, it can be challenging to make the action happen.

Figure 1 from the paper shows the process in more detail and I strongly recommend to take a look at it. It starts from the design of intervention, where community requirements, similar communities, best practices and problems are inputted. This similar communities precedence is new and important as it helps to leverage already adopted good practices.

The evaluation of the methodology was already done and it shows that it’s a valid and interesting new research method!

Abstract: Participatory Action Research (PAR) is an established method to implement change in organizations. However, it cannot be applied in the open source (FOSS) communities, without adaptation to their particularities, especially to the specific control mechanisms developed in FOSS. FOSS communities are self-managed, and rely on consensus to reach decisions. This study proposes a PAR framework specifically tailored to FOSS communities. We successfully applied the framework to implement a set of quality assurance interventions in the Robot Operating System community. The framework we proposed is composed of three components, interventions design, democratization, and execution. We believe that this process will work for other FOSS communities too. We have learned that changing a particular aspect of a FOSS community is arduous. To achieve success the change must rally the community around it for support and attract motivated volunteers to implement the interventions.

Finding lines of code that require review – my 100 blog post!

Image by skeeze from Pixabay

Working with continuous integration is an exciting new filed. You get your code into the main branch directly. Well, that’s what the theory says. What you really get is feedback directly, at least the feedback from the automated checks for technical debt, testing and similar.

What you do not get quickly is the review of your code by your colleagues. In larger organizations, things like code reviews do not get prioritized. Therefore they tend to slow down software development rather than speed up!

In this paper, we set of to understand how to fix that. We used Gerrit as the tool to extract lines of code to review, instead of reviewing all of the lines. Here is a short video about this: https://play.gu.se/media/t/0_h7hx95d2

The abstract of the paper is included:

Code reviews are one of the first quality assurance tasks in continuous software integration and delivery. The goal of our work is to reduce the need for manual reviews by automatically identify which code fragments should be further reviewed manually. We conducted an action research study with two companies where we extracted code reviews and build machine learning classifiers (AdaBoost and Convolutional Neural Network — CNN). Our results show that the accuracy of recognizing code fragments that require manual review, measured with Matthews Correlation Coefficient, was 0.70 in the combination of our own feature extraction and CNN. We conclude that this way of combining automation with manual code reviews can improve the speed of reviews while providing organizations with the possibility to support knowledge transfer among the designers.

If a tool can automatically refactor our code – is it good or bad for us, programmers?

https://link-springer-com.ezproxy.ub.gu.se/article/10.1007/s10664-020-09826-7

Image by GimpWorkshop from Pixabay

Recently, I’ve read an article in Empirical Software Engineering about automated code refactoring. I must admit that I do refactoring quite seldom. It’s a tedious task and for the software that I write, quite unnecessary. My software is often a set of scripts to solve a specific task and then the key is to document it, not refactor. A good documentation helps me to understand what I did in that code and how it works. Yes, I know it sounds like a cliché, but that’s how it is for me. I’m switching tasks so often that I forget what the code was doing.

Nevertheless, I recognize the code that is nicely written, formatted and refactored. Therefore, I was on a lookout for a tool that could do something like that for me – suggest a refactoring that I could implement.

So, this is a paper that I found, which I would like to try out. It is a tool that was evaluated through interviews with designers and developers. Although they can recognize that the code was refactored, but they seemed to be happy about it. So, I’m off to try out the tool:)

Abstract: Refactoring is a maintenance activity that aims to improve design quality while preserving the behavior of a system. Several (semi)automated approaches have been proposed to support developers in this maintenance activity, based on the correction of anti-patterns, which are “poor” solutions to recurring design problems. However, little quantitative evidence exists about the impact of automatically refactored code on program comprehension, and in which context automated refactoring can be as effective as manual refactoring. Leveraging RePOR, an automated refactoring approach based on partial order reduction techniques, we performed an empirical study to investigate whether automated refactoring code structure affects the understandability of systems during comprehension tasks. (1) We surveyed 80 developers, asking them to identify from a set of 20 refactoring changes if they were generated by developers or by a tool, and to rate the refactoring changes according to their design quality; (2) we asked 30 developers to complete code comprehension tasks on 10 systems that were refactored by either a freelancer or an automated refactoring tool. To make comparison fair, for a subset of refactoring actions that introduce new code entities, only synthetic identifiers were presented to practitioners. We measured developers’ performance using the NASA task load index for their effort, the time that they spent performing the tasks, and their percentages of correct answers. Our findings, despite current technology limitations, show that it is reasonable to expect a refactoring tools to match developer code. Indeed, results show that for 3 out of the 5 anti-pattern types studied, developers could not recognize the origin of the refactoring (i.e., whether it was performed by a human or an automatic tool). We also observed that developers do not prefer human refactorings over automated refactorings, except when refactoring Blob classes; and that there is no statistically significant difference between the impact on code understandability of human refactorings and automated refactorings. We conclude that automated refactorings can be as effective as manual refactorings. However, for complex anti-patterns types like the Blob, the perceived quality achieved by developers is slightly higher.