September 2021 – SE metrics (Software Engineering)

How to make your pull request merged quickly… (article highlight)

How Developers Modify Pull Requests in Code Review | IEEE Journals & Magazine | IEEE Xplore

I must admit that I’m not the greatest contributor to OSS projects. Yes, I did a few of those and contributed to projects, but this is more like a hobby than a real work. My goal for 2022 is to make it better and even put together some docker containers to make my scripts more reusable. I even bought a book about Docker, which I’ve read and (theoretically) I’m good to go.

Anyways, I stumbled upon this work which is about how developers make good pull requests. The paper has examined OSS projects and found that you need to make a clear change as part of the pull request, you need to make a clear classification of that change and then you have a high chance that the pull request will be adopted soon.

Stay tuned for more on code reviews…

Predicting defect, but continuously (article highlight)

Continuous Software Bug Prediction (yorku.ca)

Although a lot has been written about predicting defects, the problem is still valid. Some systems have more defects than others. In academia, we can do two things – educate young engineers in making better software or construct models for predicting where and when to find defects.

A lot of work in the defect prediction model development focuses on more-or-less randomly found releases. However, software development is not random, but structured, and often, continuous. This means that it’s important to understand that not all defects are found in the same release/path/commit as they are introduced (BTW: there is a lot of work on this aspect too).

In this work, the authors analyze 120 continuous releases of six software products and demonstrate the value of their prediction models. The novelty of this approach is a system that checks whether the releases are similar to one another based on the distributional characteristics. This means that the prediction models are tuned to each release based on these characteristics. These characteristics are, mostly, well-known metrics like the average cyclomatic complexity of a file, a MaxInheritanceTree of a class, etc. So – easy to collect and analyze, a lot of tools can be used for that.

The results, in short, show that the new method is better than randomly choosing a release or bagging releases. The results differ per project, but the approach is better than the other two, across the board.

I like the approach and will try it the next time I get my hands on software defects, issues, challenges. Let’s see when that happens:)

Guiding the selection of research methodologies (article highlight)

Guiding the selection of research methodology in industry–academia collaboration in software engineering – ScienceDirect

Research methodology is something that we must follow when conducting research studies. Without a research methodology, we just search for something and if we find it, we do not know if this finding is universal, true, or even if it really exists…

In my early works, I got really interested in empirical software engineering, in particular in experimentation. One of the authors of this article was one of my supervisors and I fell for his way of understanding and describing software engineering – as an applied area of research.

Over time, I realized that experimentation is great, but it is still not 100% what I wanted. I understood that I would like to see more collaboration with software engineers in the industry, those who make their living by programming, architecting, testing, modifying the code. I did a study at one of the vehicle manufacturers in Sweden, where I studied the complexity of the entire car project. There I understood that software engineering needs to be studies and practices in the industry. Academia is the place where we shape young minds, where we can gather multiple companies to share their experiences, and where we can make findings from individual cases into universal laws.

In this article, the authors discuss research methodologies applicable for industrial, or industry-close research. They discuss even one of the technology transfer models as a way of research co-production and co-validation.

The authors conclude this great overview in the following way (from the conclusions):

When it comes to differences, the three methodologies differ in their primary objective: DSM on acquiring design knowledge through the design of artifacts, AR on change in socio-technical systems, and TTRM on the transfer of research to industry. The primary objective of one methodology may be a secondary objective in another. Thus, the differences between them are more in their focus than in which activities they include.

In our analysis and comparison of their feasibility for industry–academia collaboration in software engineering research, the selection depends on the primary objective and scope of the research (RQ3). We, therefore, advice researchers to consider the objectives of their software engineering research endeavor and select an appropriate methodological frame accordingly. Furthermore, we recommend studying different sources of information concerning, in particular, the chosen research methodology to better understand the methodology before using it when conducting industry–academia collaborative research.

I will include this article as mandatory reading in my AR Ph.D. course in the future.

The speed of light and laws of software engineering…

While on vacation, I managed to watch a number of sci-fi movies, which I wanted to watch but did not have the time during the academic year. This got me thinking about certain laws of physics and the laws of software engineering. I think there are many similarities, and let me start by considering the speed of light, as a starter.

First of all, what we know about the speed of light is that it’s the fastest we know and that Einstein’s theory of relativity says that we cannot travel faster than the speed of light. Even if we were, such a speed is tremendously difficult.

How would we know where we travel if we cannot see where we travel (we go as fast as we can see, literally). So, the travel would be very fast and, probably, very short.
If we, somehow, see where we go, or detect an obstacle (e.g. by knowing its predicted position based on prior observations), how can we steer? We’re going so fast, that for the majority of the travel time, we would be going in straight lines. These straight lines would be similar to either the hyperjumps from Star Wars or the clicks from the Guardians of the Galaxy.
If we’re literally the fastest objects, how can others see us and avoid is? Is it possible to avoid the light? No, it’s not possible. This means that we would be super-chaotic.

Therefore, I think that, even if we could, traveling at the speed of light is probably not the best idea. At least not conceptually, which may change as we change.

How does it affect the laws of software engineering then. Well, I would start with the laws of complexity.

For a while now, I’ve been broadcasting the opinion that the complexity of software cannot be reduced, it can only be hidden. For complex problems, we need complex software and complex software cannot be simple. If our algorithms have many conditions, we cannot take them away, we can hide them in functions, but never get rid of them. That’s the first parallel – we cannot travel faster than the speed of light.

We can hide complexity, and thus make the program/software easier to understand and maintain, but the better we hide it, the harder it is to avoid/predict complexity. Packaging complex algorithms in simple blocks will make it difficult to make modifications. Actually, not to make the modifications, but to overview the consequences of these modifications.

If we simplify the program/algorithm too much, we need to expect that it’s going to provide erroneous results for some cases – again, complex problems require complex programs. An example of such issue is approximating continuous functions – since our computers are discrete, there is always some degree of error in such an approximation.

Finally, interconnectivity and modularity as a means of handling complexity have their limits. I do not think we can develop increasingly complex software by increasing its size. I believe it’s going to be difficult in the long run. We need to make sure that we have the competence to handle complexity and we need to be able to make the complexity apparent.