Ethics in data mining

BIld av Tumisu från Pixabay

https://link.springer.com/article/10.1007/s10664-021-10057-7

A lot of software engineering research studies use open source data and mine software repositories. It’s a common practice since it allows to test our hypotheses before asking for previous resources from our collaborating companies. By mining open source data we can also learn whether our study makes sense; we can see it as a pilot study of some sorts.

Mining software repositories has evolved into a popular activity since we got access to repositories like Github. There are even guidelines for assessing this kind of studies, e.g., https://sigsoft.org/EmpiricalStandards/docs/ and we have regulations of what we can do with the open source data – these can be in the form of a license, law (like GDPR or the CCPA) or the need for asking an ethical board for an approval. However, there is also a common sense – not everything that is legal is appropriate or ethical. We always need to ensure that no individual can be a subject to any harm as a result of our actions.

In the article that I want to bring up today, the authors discuss the ethical frameworks for ethical software engineering studies based on open source repositories. We need to make sure that:

  1. We respect the persons, which stresses the need for approval and consent.
  2. Beneficence, which means that we need to minimize the harm, but maximize the benefit.
  3. Justice, which means that we need to consider each individual equally.
  4. Respect for law and public interest, which entails conducting due diligence on which data we can use and in which way.

The most interesting part of this article is the analysis of different cases of mining software repositories. For example, the case of analyzing the code, reviews, commit messages and other types of data in the repositories.

I recommend this article for everyone who considers working with mining software repositories.

What the GPT technology really changes in SE

Image source: pixabay

GPT technology, exemplified by the Github Copilot and its likes, changes software engineering to the ground. There is no doubt that the technology places a new tool in our engineering shed. It allows us to create software with a completely different set-up than what we are used to.

Now, what it really changes is only a few things, but these are very big ones.

  1. Programmers —> designers and architects. GPT can write source code like no other tool on the market. And it only gets better at this. A quick glimpse at the Github Next website gives us a good understanding that this team has only got started. This changes everything we know about engineering software. Bad programmers will disappear over time. Good software designers, architects and software engineers will take their place. They will be fewer in number, but better in quality.
  2. Software development —> software engineering. Designers will no longer get stuck in solving a small bit of a puzzle. GPT will do it for them. Instead of thinking how to write a test case, the designers will think how to test the software in the best possible way. They will focus on the engineering part of the software engineering. Something that I’m teaching my students from day one.
  3. Consultancy —> knowledge hubs. Since programming will become easier and more approachable, we will need people who know how to solve a problem, not how to write a program. This big chunk of business of the consultancy companies will disappear. The consultancy companies will specialize in their domains and in problem-solving.

There will also be other things that will happen. Requirements will not be the same as they are. Testing will be different, architecting will be smarter and management more optimal. Knowledge will be more valued and critical thinking will be needed even more.

Well, this is my end of the academic year blog post. More to come after the summer. Stay safe!

Notes from an apocalypse…

NOTES FROM AN APOCALYPSE : MARK O’CONNELL: Amazon.se: Böcker

I’ve read this book recently as the title and the authors caught my attention. Can you really write notes from the apocalypse? Well, turns out that the authors of this book made a very interesting twist to it.

This book is about people who prepare for the apocalypse. It takes us to a number of places where we meet people who prepare for the worse. For me, the most interesting was a guy who bought an old army bunker and prepared a reasonably priced ranch for survining after a nuclear war. Well, reasonably is still 35,000 dollars, but given that you get to live through the worse, maybe it’s not that expensive.

However, it was not the price that caught my eye. It was essentially how he marketed that shelter. The shelter itself was quite spartan, as opposed to shelter for the ultra-rich people with pools, game rooms, cinemas and what have you.

The main selling point for the shelter was not the spartan condition, it was the dream and the possibility of survival. The owner was selling people on the idea that they will be the ones to create the brave new world after the old one collapses.

I’m not certain that there would be world after the nuclear apocalypse (Chernobyl’s disaster happen 30 years ago and the area will be inhabitable for the next 200 years), but I did like the way he sold the “condos” in the shelter. Quite brilliant, actually.

Human compatible… (book review)

Human Compatible: AI and the Problem of Control : Russell, Stuart: Amazon.se: Böcker

AI is here to stay. We know that. It will only grow in its influence. We know that too. Especially after the release of ChatGPT we know that.

This book looks into different scenarios of co-existence between humans and AI. This is a novel view on the topic, which differentiates this book from the other of this kind. The previous view was either about some sort of doomsday theories how AI takes over the world. Well, there was also a view that AI will never really hit it off, because of the lack of conciousness and a human soul.

This book starts by looking at the historical development of humanity when a new technology was invented. First we have some limitations, which stop us from mass-using this technology. Then, we improve it and start using it a lot, which creates jobs and new markets. Then we automate it so that it can scale fast, which causes mass loss of jobs related to it.

Imagine banking – first, it was manual, which was cumbersome and error prone. Then came calculating machines, which required an army of operators who inputted simple instructions and got results. Then computers came and finally the Internet. Banks are still there, as institutions, but the job of a banker is not the same as 100 years ago. Well, it’s not really the same as 20 years ago; somewhat similar to 10 years ago, but not really.

The same goes with AI and therefore we need to lear how to co-exist with it. We can control it, or we can adjust to it or we can co-develop it and take advantage of it.

I strongly recommend this book as a reading about how to tackle the developments in AI, but more realistically, not doomsday profecy-style.

Continuous deployment in systems of systems…

Continuous deployment in software-intensive system-of-systems – ScienceDirect (gu.se)

Interestingly, this is a paper from colleagues of ours from the department. The paper presents how one company – Ericsson – works with continuous deployment of their large software system in 3G RAN (Radio Access Networks). The highlights from the article are as follows:

  • New software field testing and validation activities become continuous.
  • Software deployment should be orchestrated between the constituent system.
  • A pilot customer to partner with is key for success.
  • Companywide awareness and top management support are important.
  • Documentation and active monitoring are critical for continuous deployment.

I like this paper because it presents a practical approach and a good set of practices that can be taken up by other companies.

Transparency and explainability of AI…

Image by Sergey Gricanov from Pixabay

Transparency and explainability of AI systems: From ethical guidelines to requirements – ScienceDirect

In the area of ChatGPT and increasingly larger language models, it is important to understand how these models reason. Not only because we want to put them in safety-critical systems, but mostly because we need to know why they make things up.

In this paper, the authors draw conclusions regarding how to increase the transparency of AI models. In particular, they highlight that:

  • The AI ethical guidelines of 16 organizations emphasize explainability as the core of transparency.
  • When defining explainability requirements, it is important to use multi-disciplinary teams.

The define a four-quandrant model for explainability of requirements and AI systems. The model links four key questions to a number of aspects:

  1. What to explain (e.g., roles and capabilities of AI).
  2. In what kind of situation (e.g., when testing).
  3. Who explains (e.g., AI explains itself).
  4. To whom to explain (e.g., customers).

It’s an interesting reading that takes AI systems to more practical levels and provide the ability to turn explainability into software requirements.

Defect predictions – still valid in 2023…

Image by WikiImages from Pixabay

Industrial applications of software defect prediction using machine learning: A business-driven systematic literature review – ScienceDirect

Wow, when I look at the last entry, it was two months ago. Well, somewhere between the course in embedded systems for my students, delegation to Silicon Valley and all kinds of challenges, the time seemed to pass between my fingers.

Well, nevertheless, I would like to put a highlight to the article from our colleagues who specialize in defect predictions and systematic reviews. The article describes how companies use defect prediction models and when they do it.

It’s a nice sunday reading for those of you who are interested in the topic. It is a good source of best practices as well as a solid source for looking for datasets for defect prediction.

Enjoy your reading!

Creating your own models

BIld av Pexels från Pixabay

Last week I wrote about our seminar and Co-pilot. I’m sure that has stimulated a lot of thoughts on these language models. Many think that this is a difficult task to create, train and use them.

Nothing further from the truth. If you are interested in training such a model from scratch, I recommend the following book (in particular Chapter 4 if you are anxious to get started).

Transformers for Natural Language Processing | Packt (packtpub.com)

The book explains how these models work for natural language processing, but making it work for source code is trivial. Use your code instead of the provided text and there you go. You need a GPU or use some cloud service, otherwise you will wait forever.

But if you have it, you can get really cool results within a day or two.

Good luck!

GitHub Co-pilot and code generation

So, this week’s post is my reflection on the seminar that we hosted last week (the recording is above). It was an eye-opener for me in a few aspects.

For the first, it was the question of ownership of things. Since AI is not a subject in legal cases, it cannot really own anything. I know, AI and computational models are not the same, but for the sake of the argument let’s assume that they are. By the end of the day, it is still a human being that presses the button and generates new source code or comments or what have you. So, the responsibility is still very much on us when we use these tools.

The second, it was the question about the community and why we have open-source software. We certainly do not put our source code openly for someone to profit from it. Attribution and recognition are very important (if not the most important) aspects of any open-source community. So, using their code to create commercial models requires at least some attribution. Why not show which code was used to train these models and show how good the communities really are?

Finally, my main point still stands – we should use these models to become better. They make us so much more productive that we should not go back to the old ways of writing software. Providing suggestions and ideas to programmers can make our software better, shipped faster and potentially more reliable.

However, we need to make sure that we change the way we attribute the software. Myself, I will start to add “co-created by Github Co-pilot and the OSS communities” to my work when I use the tool. Maybe you can do that too? At least to give some attribution back to our countless colleagues who deserve it….

How this world works…. a book to read

https://www.amazon.se/How-World-Really-Works-Scientists/dp/B001XIIE24/ref=sr_1_1?qid=1673202514&refinements=p_27%3AVaclav+Smil&s=books&sr=1-1

So, the holidays are over, a new year starts, new resolutions are made, new projects started. But before we get all stuck in the work, I’d like to share a book suggestion to read on the go.

This book is about the current trends in the modern world. It discusses such aspects as the our dependency on technology, the way we use it to produce food and to make things. It talks about how the current supply chains get disrupted and what we need to do to maintain/regain the balance.

Finally, it talks about the energy, our dependence on the oil energy and on the nuclear power.

However, this is not a doomsday book, quite a contrary. It is a book about the hope in the development of the modern society and how we should contribute to it. I strongly recommend this book as a reading for the evenings, after hearing about the energy crisis. I recommend to take this book in and reflect on the fact that we have achieved a lot and the world is not as scary as the news want it to be, or create it to be.

I hope that you will enjoy the book as much as I do.