Action research in open source – a nice article

Image by Amílcar Vanden-Bouch from Pixabay

https://link.springer.com/article/10.1007/s10664-020-09849-0?utm_source=toc&utm_medium=email&utm_campaign=toc_10664_25_5&utm_content=etoc_springer_20200904

I’ve been looking out for good examples of articles about action research in software engineering for a while. There is a lot of those coming from the participatory design community and ethnography in software engineering.

This paper is an example of how one can conduct action research together with an open source community. It shows how to conduct the research while being part of the community and adds a new angle on the topic – how do we democratize the research design. In contrast to company-based development, an open source community is free to accept the new ways of working or not. Therefore, it can be challenging to make the action happen.

Figure 1 from the paper shows the process in more detail and I strongly recommend to take a look at it. It starts from the design of intervention, where community requirements, similar communities, best practices and problems are inputted. This similar communities precedence is new and important as it helps to leverage already adopted good practices.

The evaluation of the methodology was already done and it shows that it’s a valid and interesting new research method!

Abstract: Participatory Action Research (PAR) is an established method to implement change in organizations. However, it cannot be applied in the open source (FOSS) communities, without adaptation to their particularities, especially to the specific control mechanisms developed in FOSS. FOSS communities are self-managed, and rely on consensus to reach decisions. This study proposes a PAR framework specifically tailored to FOSS communities. We successfully applied the framework to implement a set of quality assurance interventions in the Robot Operating System community. The framework we proposed is composed of three components, interventions design, democratization, and execution. We believe that this process will work for other FOSS communities too. We have learned that changing a particular aspect of a FOSS community is arduous. To achieve success the change must rally the community around it for support and attract motivated volunteers to implement the interventions.

Finding lines of code that require review – my 100 blog post!

Image by skeeze from Pixabay

Working with continuous integration is an exciting new filed. You get your code into the main branch directly. Well, that’s what the theory says. What you really get is feedback directly, at least the feedback from the automated checks for technical debt, testing and similar.

What you do not get quickly is the review of your code by your colleagues. In larger organizations, things like code reviews do not get prioritized. Therefore they tend to slow down software development rather than speed up!

In this paper, we set of to understand how to fix that. We used Gerrit as the tool to extract lines of code to review, instead of reviewing all of the lines. Here is a short video about this: https://play.gu.se/media/t/0_h7hx95d2

The abstract of the paper is included:

Code reviews are one of the first quality assurance tasks in continuous software integration and delivery. The goal of our work is to reduce the need for manual reviews by automatically identify which code fragments should be further reviewed manually. We conducted an action research study with two companies where we extracted code reviews and build machine learning classifiers (AdaBoost and Convolutional Neural Network — CNN). Our results show that the accuracy of recognizing code fragments that require manual review, measured with Matthews Correlation Coefficient, was 0.70 in the combination of our own feature extraction and CNN. We conclude that this way of combining automation with manual code reviews can improve the speed of reviews while providing organizations with the possibility to support knowledge transfer among the designers.

If a tool can automatically refactor our code – is it good or bad for us, programmers?

https://link-springer-com.ezproxy.ub.gu.se/article/10.1007/s10664-020-09826-7

Image by GimpWorkshop from Pixabay

Recently, I’ve read an article in Empirical Software Engineering about automated code refactoring. I must admit that I do refactoring quite seldom. It’s a tedious task and for the software that I write, quite unnecessary. My software is often a set of scripts to solve a specific task and then the key is to document it, not refactor. A good documentation helps me to understand what I did in that code and how it works. Yes, I know it sounds like a cliché, but that’s how it is for me. I’m switching tasks so often that I forget what the code was doing.

Nevertheless, I recognize the code that is nicely written, formatted and refactored. Therefore, I was on a lookout for a tool that could do something like that for me – suggest a refactoring that I could implement.

So, this is a paper that I found, which I would like to try out. It is a tool that was evaluated through interviews with designers and developers. Although they can recognize that the code was refactored, but they seemed to be happy about it. So, I’m off to try out the tool:)

Abstract: Refactoring is a maintenance activity that aims to improve design quality while preserving the behavior of a system. Several (semi)automated approaches have been proposed to support developers in this maintenance activity, based on the correction of anti-patterns, which are “poor” solutions to recurring design problems. However, little quantitative evidence exists about the impact of automatically refactored code on program comprehension, and in which context automated refactoring can be as effective as manual refactoring. Leveraging RePOR, an automated refactoring approach based on partial order reduction techniques, we performed an empirical study to investigate whether automated refactoring code structure affects the understandability of systems during comprehension tasks. (1) We surveyed 80 developers, asking them to identify from a set of 20 refactoring changes if they were generated by developers or by a tool, and to rate the refactoring changes according to their design quality; (2) we asked 30 developers to complete code comprehension tasks on 10 systems that were refactored by either a freelancer or an automated refactoring tool. To make comparison fair, for a subset of refactoring actions that introduce new code entities, only synthetic identifiers were presented to practitioners. We measured developers’ performance using the NASA task load index for their effort, the time that they spent performing the tasks, and their percentages of correct answers. Our findings, despite current technology limitations, show that it is reasonable to expect a refactoring tools to match developer code. Indeed, results show that for 3 out of the 5 anti-pattern types studied, developers could not recognize the origin of the refactoring (i.e., whether it was performed by a human or an automatic tool). We also observed that developers do not prefer human refactorings over automated refactorings, except when refactoring Blob classes; and that there is no statistically significant difference between the impact on code understandability of human refactorings and automated refactorings. We conclude that automated refactorings can be as effective as manual refactorings. However, for complex anti-patterns types like the Blob, the perceived quality achieved by developers is slightly higher.

PHANTOM – finding well engineered software projects, fast…

https://link-springer-com.ezproxy.ub.gu.se/article/10.1007%2Fs10664-020-09825-8

Image by 2427999 from Pixabay

I’ve worked with two great students – Peter and Joshua – who wanted to do something interesting. They developed a tool that could replicate a study from other researchers. However, they did it faster and with less data. We also managed to team up with Mirek from Poznan who improved the classification algorithm and asked his colleagues from new, industrial data.

And this is the outcome – a tool that can connect to a git repository and recognise whether your project is well engineered or not. It helps companies to understand whether their teams are working in a structured manner or ad-hoc.

The tool provides the possibility to assess whether a specific repository is in need for maintenance or not.

Abstract:

Context: Within the field of Mining Software Repositories, there are numerous methods employed to filter datasets in order to avoid analysing low-quality projects. Unfortunately, the existing filtering methods have not kept up with the growth of existing data sources, such as GitHub, and researchers often rely on quick and dirty techniques to curate datasets.

Objective: The objective of this study is to develop a method capable of filtering large quantities of software projects in a resource-efficient way.

Method: This study follows the Design Science Research (DSR) methodology. The proposed method, PHANTOM, extracts five measures from Git logs. Each measure is transformed into a time-series, which is represented as a feature vector for clustering using the k-means algorithm.

Results: Using the ground truth from a previous study, PHANTOM was shown to be able to rediscover the ground truth on the training dataset, and was able to identify “engineered” projects with up to 0.87 Precision and 0.94 Recall on the validation dataset. PHANTOM downloaded and processed the metadata of 1,786,601 GitHub repositories in 21.5 days using a single personal computer, which is over 33% faster than the previous study which used a computer cluster of 200 nodes. The possibility of applying the method outside of the open-source community was investigated by curating 100 repositories owned by two companies.

Conclusions: It is possible to use an unsupervised approach to identify engineered projects. PHANTOM was shown to be competitive compared to the existing supervised approaches while reducing the hardware requirements by two orders of magnitude.

What do elite software developers do in software projects?

https://dl-acm-org.ezproxy.ub.gu.se/doi/10.1145/3387111

Image by Jose B. Garcia Fernandez from Pixabay

A while back I read an article in ZDNet about Linus Torvalds, the creator of Linux, and his daily work. He was (at the time of reading, which is ca. 2 years back) still working on the code. However, he was mostly working on the design of the system, reviewing patches and supporting younger designers. I’ve also read a number of articles which claimed the importance of code reviews as a way of teaching younger designers about the product and the code base.

In this paper, I’ve found that the support for younger designers is what the elite developers do a lot of. It seems that the communication, organisation and support are the activities that the elite developers find important. It’s aligned with what we do at the universities as well. The most elite professors work with students, show them how to program and how to structure their code. Seems like this is a very good way of continuing your career – help other be better.

I guess it’s time to change my wallpaper from “coding” to “teaching”….

Abstract: Open source developers, particularly the elite developers who own the administrative privileges for a project, maintain a diverse portfolio of contributing activities. They not only commit source code but also exert significant efforts on other communicative, organizational, and supportive activities. However, almost all prior research focuses on specific activities and fails to analyze elite developers’ activities in a comprehensive way. To bridge this gap, we conduct an empirical study with fine-grained event data from 20 large open source projects hosted on GITHUB. We investigate elite developers’ contributing activities and their impacts on project outcomes. Our analyses reveal three key findings: (1) elite developers participate in a variety of activities, of which technical contributions (e.g., coding) only account for a small proportion; (2) as the project grows, elite developers tend to put more effort into supportive and communicative activities and less effort into coding; and (3) elite developers’ efforts in nontechnical activities are negatively correlated with the project’s outcomes in terms of productivity and quality in general, except for a positive correlation with the bug fix rate (a quality indicator). These results provide an integrated view of elite developers’ activities and can inform an individual’s decision making about effort allocation, which could lead to improved project outcomes. The results also provide implications for supporting these elite developers.

Your code and AI – more than precision and recall!

Image by Daniel Hannah from Pixabay

Using machine learning and AI to improve your coding is an important area of research. Together with colleagues we work with these techniques, to take them from open source to more industry quality.

There are two great tools that one can use today already. One of the tools is a beta version of add-in for visual studio, which helps software engineers to write code.

https://www.microsoft.com/en-us/ai/ai-lab-code-defect

Microsoft is very active in this area and even has release a set of tools that support the development of AI systems: https://www.microsoft.com/en-us/research/project/visual-studio-code-tools-ai/

Also:

https://techcommunity.microsoft.com/t5/educator-developer-blog/visual-studio-code-tools-for-ai-extension/ba-p/379420

What is great is that the tools are, naturally, available freely!

Another tool is a DeepCode, which analyzes software code and provides suggestions to improve it – e.g. use a specific design pattern or refactoring.

https://www.deepcode.ai/

This is great that we have increasingly more tools and that AI engineering matures. We do not want to have precision and recall steer our development. We want to have real testing and real systems. We also need to work with data quality in order to ensure that the systems are reliable.

The alternative is that we use MCC, precision, recall, F1-score to tell us how good a system is, which is not entirely true. These measures do not provide any view on how the system reflects the requirements put on it. These measures allow us to compare different classifiers, but not systems.

I hope that we can focus more discussion on AI quality and not classification quality/accuracy.

Engineering AI systems – differences to engineering “other” software systems…

https://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=9121629

Image by Free-Photos from Pixabay

Being a software engineer working with AI for a while, I noticed that the engineering of AI systems is different. Well, maybe not building the actual system, but the way in which the knowledge about quality, testing and maintenance differ.

In this article, IEEE Software’s Editor in Chief presents her view on the topic. The main point is that this engineering is both similar and different. This quote from the paper summarizes it nicely: “I argue that our existing design techniques will not only help us make progress in understanding how to design, deploy, and sustain the structure and behavior of AI-enabled systems, but they are also essential starting points. I suggest that what is different in AI engineering is, in essence, the quality attributes for which we need to design and analyze, not necessarily the design and engineering techniques we rely on. “

One of the differences is the process of development. It is not aligned with the non-ML systems, e.g. in terms of training, testing, maintenance. ML systems are data-centric and this needs to be reflected in the AI engineering processes.

Ipek Ozkaya discusses the following misconceptions about the differences:

  • We can specify systems – both AI and non-AI systems cannot really be fully specified,
  • System correctness can be verified – we can never fully verify systems, neither AI-based on non-AI based (e.g. due to complexity),
  • We can avoid hidden dependencies,
  • We can manage system change propagation,
  • Frameworks do it all,
  • We can build reliable systems from unreliable and unpredictable subcomponents

I recommend this article to get a quick overview of the gist of the differences and misconceptions.

Emerging new field – DUO – Data mining and optimization (EMSE article)

Image by klickblick from Pixabay 

New sub-areas or fields within software engineering are not that common, but they come up once in a while. The authors of this article (https://doi-org.ezproxy.ub.gu.se/10.1007/s10664-020-09808-9, Better software analytics via “DUO”: Data mining algorithms using/used-by optimizers) argue that this is the case now.

In this article, the authors provide a view that data mining and building optimization models are done in tandem and that this is the new field. They show that the data mined from repositories influences optimization models and that the development of models influences data mining.

The authors make the following claims (quoted from the paper, references removed):

  • Claim1:For software engineering tasks, optimization and data mining are very similar. Hence, it is natural and simple to combine the two methods.
  • Claim2:For software engineering tasks. optimizers can greatly improve data miners. A data miner’s default tuners can lead to sub-optimal performance. Automatic optimizers can find tunings that dramatically improve that performance.
  • Claim3:For software engineering tasks, data miners can greatly improve optimization. If a data miner groups together related items, an optimizer can explore and report conclusions that are general across a set of solutions. Further, optimization for SE problems can be very slow. But if that optimization executes over the groupings found by a data miner, that inference can terminate orders of magnitude faster.
  • Claim4:For software engineering tasks, data mining without optimization is not recommended. Conclusions reached from an unoptimized data miner can be changed, sometimes even dramatically improved, by running the same tuned learner on the same data. Researchers in data mining should, therefore, consider adding an optimization step to their analysis.

These claims make a lot of sense and they are aligned with my observations. I recommend this article for everyone who is working at or developing a metric team or a data analysis/data science team.

Tools in Agile development

https://arxiv.org/pdf/2004.00289.pdf

Image by Pexels from Pixabay 

Agile software development is here to stay. Software engineering has been changed completely when it arrived and I cannot see how it will change back to waterfall.

Even the so-called post-agile methodologies are built on the same premises – collaboration within teams, empowerement, taking responsibility. Since we educate our future software engineers to be self-organizing, to take responsibility for their topics, projects and personal development, we will live with see a development in this direction.

At the same time, the tools that are needed to support these teams evolve. When I started working as a software engineer, Rational ClearCase was the go-to tool for version control. Now, my students don’t even recognize it. Well, Rational does not exist any more either – the last time I checked it was bought by IBM, no one knows what happens now.

In this paper, the authors take a look at how modern collaboration patterns look like and which tools are used there. They have observed that consolidation of tools (e.g. towards Slack) introduced very positive dynamics to the team. However, they also found that some automated notifications (e.g. from Jenkins) got lost in these new tools, which was not appreciated.

This case study is a longitudinal one, which is a great example of how a research team builds up a long-term collaboration, so important for modern academia and industry!

Developing recommender systems – a framework which may just be the one (for us)…

https://rdcu.be/b3wgZ

Image by Gerd Altmann from Pixabay 

Creating recommendation systems is a tricky task. We need to add the temporal domain to the data. In particular, we need to make sure that we capture what was recommended before to the specific user and how the user reacted upon that. We also need to capture the evolution of the users and the data.

In this paper, the authors present a framework, RectoLibry, which helps to construct these kind of systems. The system captures both the parts of the development of the recommendations, but also their deployment.

The system is based on designing an ontology (yes, my old, good friend, used since before Web 2.0, even in my own research: https://link.springer.com/chapter/10.1007/978-3-540-87875-9_60 , https://link.springer.com/chapter/10.1007/3-540-46102-7_20 ).

The ontology describes the relationships existing in the recommendation domain and provide the support for the selections and feedback loops.

I recommend to take a look at the paper and the framework if you want to build a recommendation system. I will, when looking at the assignments from the software measurement PhD course.