When it gets too much or Revenge of the Tipping point…

BIld av Katja S. Verhoeven från Pixabay

https://www.bokus.com/bok/9780316575805/revenge-of-the-tipping-point-overstories-superspreaders-and-the-rise-of-social-engineering/?utm_campaign=Performance%20Max%20%7C%20English%20%7C%20Rooth&gad_source=1&gclid=Cj0KCQiAuou6BhDhARIsAIfgrn4spmK1A21PF2Luov0HXzMwMFMsTcJKUSsvnIH5UEfxDs_lBz3TOUMaAuLEEALw_wcB

I’ve just finished reading this great book about the way in which the tipping point tips to the wrong side. It’s mostly about the law of “The large effect of the few” as Malcolm Gladwell puts it. In short, this law means that in certain situations, it’s the minority that is responsible for large effects. For example, the minority of old, badly maintained cars that contribute to to over 55% of pollution in one of the US cities. It’s about when one person, a superspreader, ends up in very specific conditions that allow this person to spread the contagion of the COVID virus at the beginning of the pandemic.

Now, we see that in software engineering a lot when we look at the tooling that we use. Let’s take the CI/CD tool Jenkins as an example. It is one of many different tools that were on the market at that time. It was not even the major one, but it was a sibling to a professional tool that was maintained by Oracle (if I recall correctly). Yet, it became very popular and the other tools did not. Since they were siblings, they were not worse, not better either; maybe a little different. What made it tip was the adoption of this tool in the community. A few superspreaders started to use it and discovered how good the tool is for automation of CI/CD tasks.

I see the same parallel to AI today. What was it that tipped the use of AI? IMHO it was a few things:

  1. Google’s LSTM use in Search – since there was a commercial value, it made sense to adopt it. Commercial adoption means business value, improvement and management focus (funding).
  2. Big data – after almost a decade of talking about big data, collecting it and indexing it, we were ready to provide the data-hungry modules with the data they needed to do something useful.
  3. HuggingFace – our ability to share models and use them without requirements on costly GPUs and large (and good) datasets.
  4. Access to competence – since we have so many skilled computer scientists and software engineers, it was easy to get hold of the competence needed to turn ideas into products. Google’s Deepmind is a perfect example of it. People behind it got the Nobel Prize.

Well, the rest is history as they say…. But, what will the next invention on the verge of the tipping point be?

How to create a tribe…

Conspiracy: A History of Boll*cks Theories, and How Not to Fall for Them: Phillips, Tom, Elledge, Jonn: 9781472283405: Amazon.com: Books

During the holidays, I’ve had some time to read books outside of software engineering. This one caught my attention, not because I like conspiracy theories, but mostly because I got interested in how theories actually spread.

Now, while this book is about conspiracies, don’t get me wrong, it is also about creating communities and making things spread. Whether this is a conspiracy or maybe just a normal culture, it is interesting to read.

Anyways, if you have some time on your hands, instead of scrolling Instagram or Facebook feeds, take a look at this book. I guarantee that it will provide even more fun.

Commander programmer and his army of bots… – Software Engineering in a post-GenAI era

Engineering software today becomes a profession where AI can make the most difference. GitHub and the availability of open source code, as well as natural language texts, provided the best possible fuel for creating large and great models. Some predict that we will not need programmers any more, but I predict that we will need them. I also think that these programmers will be much better than they are today. I view the future as quite optimistic and bright for our profession.

To begin with, I see the profession to change in a way that one single programmer will be more like a team leader for a number of LLMs or bots – hence the title. The programmer will use one model/bot for extracting requirements from standards, documents, twitters or other text sources. The requirement bot will help the programmer to find the user stories/requirements, to prioritize them and to set up a backlog.

Then, the programmer will use another bot to create high-level design. The bot will provide an initial design and will provide the programmer with some sort of conversational interface to reason about the design – the patterns, the architectural styles and the typical non-functional characteristics to maintain for the software.

At the same time, the programmer will use either the same bot or another one to write the source code of the software. We already use tools like GitHub CoPilot for these tasks and these tools will be even better. When constructing the software, the programmer will also use testing bots to create test cases and to improve the programs. Here, the work on program repair is definitely very interesting as it provides the ability to automatically improve the low-level design of the software.

Finally, when the software is complete, then during the release, the programmer will use bots to monitor the software. Finding defects fast, monitoring the performance, availability and security of software will be delegated to even more bots. They will do the work faster than any programmer anyways.

The job of the programmer will then be to instruct, integrate and monitor these bots. A little bit like a team leader who needs to make sure that all team members agree on certain principles. This means that the programmers will need to know more about the domains where they work as well as they will need access to tooling that supports their workflows.

What will also change is our perception of quality of software. If we can use these bots and make the software construction much faster, then we probably will not need to write super-maintainable code – the bots will be able to decipher even the most obscure code and help us to improve it. Hey, maybe these bots will even write a completely new version of our software once they learn how to optimize their design and when we learn how to link these bots to one another. Regardless, I do not think they will be able to write a Skynet kind of code….

On Security Weaknesses and Vulnerabilities inDeep Learning Systems

Image: wordcloud based on the text of the article

In the rapidly evolving artificial intelligence (AI), deep learning (DL) has become a cornerstone, driving advancements based on transformers and diffusers. However, the security of AI-enabled systems, particularly those utilizing deep learning techniques, are still questioned.

The authors conducted an extensive study, analyzing 3,049 vulnerabilities from the Common Vulnerabilities and Exposures (CVE) database and other sources. They employed a two-stream data analysis framework to identify patterns and understand the nature of these vulnerabilities. Their findings reveal that the decentralized and fragmented nature of DL frameworks contributes significantly to the security challenges.

The empirical study uncovered several patterns in DL vulnerabilities. Many issues stem from improper input validation, insecure dependencies, and inadequate security configurations. Additionally, the complexity of DL models makes it harder to apply conventional security measures effectively. The decentralized development environment further exacerbates these issues, as it leads to inconsistent security practices and fragmented responsibility.

It does make sense then to put a bit of effort into securing such systems. By the end of the day, input validation is no rocket science.

Federated learning in code summarization…

3661167.3661210 (acm.org)

So far, we have explored two different kinds of code summarization – either using a pre-trained model or training our own. However, both of them have severe limitations. The pre-trained models are often good, but too generic for the project at hand. The private models are good, but often require a lot of good data and processing power. In this article, the authors propose to use a third way – federated learning.

The results show that:

  • Fine-tuning LLMs with few parameters significantly improved code summarization capabilities. LoRA fine-tuning on 0.062% of parameters showed substantial performance gains in metrics like C-BLEU, METEOR, and ROUGE-L.
  • The federated model matched the performance of the centrally trained model within two federated rounds, indicating the viability of the federated approach for code summarization tasks.
  • The federated model achieved optimal performance at round 7, demonstrating that federated learning can be an effective method for training LLMs.
  • Federated fine-tuning on modest hardware (40GB GPU RAM) was feasible and efficient, with manageable run-times and memory consumption.

I need to take a look at this model a bit more since I like this idea. Maybe this is the beginning of the personalized bot-team that I always dreamt of?

On the Use of ChatGPT for Code Review (article review)

3661167.3661183 (acm.org)

CoPilot and other tools have been used increasingly often, but mostly for testing and for programming. Now, the questions is whether these kind of tools help much in such tasks as code review.

This paper explores how developers utilize ChatGPT in the code review process and their reactions to the AI-generated feedback. This research analyzed 229 review comments from 205 pull requests across 179 projects to understand the purposes and effectiveness of ChatGPT in code review.

The found that:

  • Developers primarily use ChatGPT for two main purposes: referencing and outsourcing. These purposes were further categorized into several sub-categories:
  • Developers used ChatGPT to gain understanding and support their opinions, including tasks such as refactoring, implementation, design, non-programming tasks, testing, documentation, and others.
  • Developers directly asked ChatGPT to resolve specific issues. This included implementation, refactoring, bug-fixing, reviewing, testing, design, documentation, and other tasks.

The study found a mixed reaction to ChatGPT-generated reviews, which is not really surprised given that it is a new technology and code reviews are not only for review, but also for learning:

  • Positive Reactions (64%): A majority of the responses were positive, indicating that developers found the AI’s suggestions helpful.
  • Negative Reactions (30.7%): A significant portion of responses were negative. The primary reasons for dissatisfaction included the solutions not bringing extra benefits, containing bugs, or not aligning with developers’ preferred coding styles.

However, what I find really interesting.

  • Enhanced Prompt Strategies: Effective use of ChatGPT requires well-crafted prompts to maximize the AI’s potential in generating useful code reviews.
  • Tool Integration: Integrating ChatGPT with existing development tools can streamline the review process.
  • Continuous Monitoring: Regular assessment and refinement of ChatGPT’s outputs are necessary to ensure high-quality code reviews.

These three points are kind of cool, because they mean that we need to learn how to instruct and use these tools. That means that we loose something (knowledge about our products), while we need to learn more general skills about prompting…

Investigating Large Language Models for Code Generation on GitHub (article review)

https://arxiv.org/abs/2406.19544

Again, on the topic of generative AI for programming. I’ve found this interesting article that reviewed the state of the adoption. It examines the use of large language models (LLMs) like ChatGPT and GitHub Copilot in software development. In short, they find that:

  1. ChatGPT and Copilot dominate code generation on GitHub, primarily for small projects led by individuals or small teams.
  2. These tools are mainly used for Python, Java, and TypeScript, generating short, low-complexity code snippets.
  3. Projects with LLM-generated code evolve continuously but exhibit fewer bug-related modifications.

So, although so many LLMs exist, it is still ChatGPT and CoPilot that have the largest share of the market. IMHO this is because of the ecosystem. It’s not enough to have an LLM, but we need to be able to access internet, interact with the model and also get it to be trained using our examples.

Volvo Cars and CoPilot

Developers are Happier and More Satisfied in Their Coding Environment (microsoft.com)

I rarely summarize other blogg articles, but this one is an exception. I felt that things like that have been in the making, so this one is no surprise. Well, a bit of surprise, as this seems to be an experience of super-modern technology in a business where software has long been on the second place.

Based on the article, six months into its rollout, developers have reported significant efficiency gains, with some tasks like unit testing seeing up to a 40% increase in productivity. Copilot’s ability to assist with testing, explaining, and generating code has allowed developers to spend more time in a “flow state,” enhancing creativity and problem-solving.

Developers at Volvo Cars are happier and find their work more enjoyable, with 75% noting increased satisfaction. The tool has also improved communication among team members, fostering better interactions and sharper problem-solving.

Anyways, this shows that companies are no longer affraid of using generative AI technologies in practice. Let’s just wait and more of this.

Software defined vehicles, truck style?

Image is taken from the presentation of Jonn Lantz, AB Volvo.

Today, we had the Software Center reporting workshop, where we talked about software development and how it will look like in the future. The picture above shows how important the software is in the current truck.

In his keynote, our colleague showed how to design software in the large scale, when the commodity is important, but innovation is what shines out; in the world, where the platforms are important, but do not get the attention that they need.

This kind of approach means that you must be able to grasp both. One must design the software to meet all kinds of features that are relvant today and may be relevant tomorrow. When I see this, I think about ChatGPT, where the platform is the ChatGPT model that allows us to create own GPT-s based on that platform.

This also reminds me about platforms like Ollama or Torch, which allow us to build products fasts and customized to our needs. We can grab models, share them, train them, and (for a small fee) we can even deploy models based on this platform.

Back from the hacked…

So, the blog was out for a while. Turned out that the web server that hosted the website was hacked. You could find it ironic, I find it annoying. Here is why.

First of all, we, as a university, outsource this kind of tech to other actors. It makes no sense to build competence about maintaining web servers locally. Yes, we do have the main website, but we should focus on research, education and outreach. So, we trust the partners that they know what they are doing. Turns out this may not always be the case.

Second, this shows that no one is immune any more. The recent attacks on Primula show that this becomes an increased problem (Inga personuppgifter läckte i hackerattacken (di.se)).

In the work of my team, we try to ensure that these attacks are harder to perform. We create methods and tools that allow to check if the software is secure or not — see this docker container: miroslawstaron/ccsat – Docker Image | Docker Hub. You can use these kind of tools to check if the software that YOU construct is secure, but you can never really be sure about the entire supply chain. Your software may be secure, use MFA and other mechanisms, but if your supplier is vulnerable – not much you can do.

So, with this words of advice – stay safe and keep back-ups!