Federated learning in code summarization…

3661167.3661210 (acm.org)

So far, we have explored two different kinds of code summarization – either using a pre-trained model or training our own. However, both of them have severe limitations. The pre-trained models are often good, but too generic for the project at hand. The private models are good, but often require a lot of good data and processing power. In this article, the authors propose to use a third way – federated learning.

The results show that:

  • Fine-tuning LLMs with few parameters significantly improved code summarization capabilities. LoRA fine-tuning on 0.062% of parameters showed substantial performance gains in metrics like C-BLEU, METEOR, and ROUGE-L.
  • The federated model matched the performance of the centrally trained model within two federated rounds, indicating the viability of the federated approach for code summarization tasks.
  • The federated model achieved optimal performance at round 7, demonstrating that federated learning can be an effective method for training LLMs.
  • Federated fine-tuning on modest hardware (40GB GPU RAM) was feasible and efficient, with manageable run-times and memory consumption.

I need to take a look at this model a bit more since I like this idea. Maybe this is the beginning of the personalized bot-team that I always dreamt of?

On the Use of ChatGPT for Code Review (article review)

3661167.3661183 (acm.org)

CoPilot and other tools have been used increasingly often, but mostly for testing and for programming. Now, the questions is whether these kind of tools help much in such tasks as code review.

This paper explores how developers utilize ChatGPT in the code review process and their reactions to the AI-generated feedback. This research analyzed 229 review comments from 205 pull requests across 179 projects to understand the purposes and effectiveness of ChatGPT in code review.

The found that:

  • Developers primarily use ChatGPT for two main purposes: referencing and outsourcing. These purposes were further categorized into several sub-categories:
  • Developers used ChatGPT to gain understanding and support their opinions, including tasks such as refactoring, implementation, design, non-programming tasks, testing, documentation, and others.
  • Developers directly asked ChatGPT to resolve specific issues. This included implementation, refactoring, bug-fixing, reviewing, testing, design, documentation, and other tasks.

The study found a mixed reaction to ChatGPT-generated reviews, which is not really surprised given that it is a new technology and code reviews are not only for review, but also for learning:

  • Positive Reactions (64%): A majority of the responses were positive, indicating that developers found the AI’s suggestions helpful.
  • Negative Reactions (30.7%): A significant portion of responses were negative. The primary reasons for dissatisfaction included the solutions not bringing extra benefits, containing bugs, or not aligning with developers’ preferred coding styles.

However, what I find really interesting.

  • Enhanced Prompt Strategies: Effective use of ChatGPT requires well-crafted prompts to maximize the AI’s potential in generating useful code reviews.
  • Tool Integration: Integrating ChatGPT with existing development tools can streamline the review process.
  • Continuous Monitoring: Regular assessment and refinement of ChatGPT’s outputs are necessary to ensure high-quality code reviews.

These three points are kind of cool, because they mean that we need to learn how to instruct and use these tools. That means that we loose something (knowledge about our products), while we need to learn more general skills about prompting…

Investigating Large Language Models for Code Generation on GitHub (article review)

https://arxiv.org/abs/2406.19544

Again, on the topic of generative AI for programming. I’ve found this interesting article that reviewed the state of the adoption. It examines the use of large language models (LLMs) like ChatGPT and GitHub Copilot in software development. In short, they find that:

  1. ChatGPT and Copilot dominate code generation on GitHub, primarily for small projects led by individuals or small teams.
  2. These tools are mainly used for Python, Java, and TypeScript, generating short, low-complexity code snippets.
  3. Projects with LLM-generated code evolve continuously but exhibit fewer bug-related modifications.

So, although so many LLMs exist, it is still ChatGPT and CoPilot that have the largest share of the market. IMHO this is because of the ecosystem. It’s not enough to have an LLM, but we need to be able to access internet, interact with the model and also get it to be trained using our examples.

Volvo Cars and CoPilot

Developers are Happier and More Satisfied in Their Coding Environment (microsoft.com)

I rarely summarize other blogg articles, but this one is an exception. I felt that things like that have been in the making, so this one is no surprise. Well, a bit of surprise, as this seems to be an experience of super-modern technology in a business where software has long been on the second place.

Based on the article, six months into its rollout, developers have reported significant efficiency gains, with some tasks like unit testing seeing up to a 40% increase in productivity. Copilot’s ability to assist with testing, explaining, and generating code has allowed developers to spend more time in a “flow state,” enhancing creativity and problem-solving.

Developers at Volvo Cars are happier and find their work more enjoyable, with 75% noting increased satisfaction. The tool has also improved communication among team members, fostering better interactions and sharper problem-solving.

Anyways, this shows that companies are no longer affraid of using generative AI technologies in practice. Let’s just wait and more of this.

Software defined vehicles, truck style?

Image is taken from the presentation of Jonn Lantz, AB Volvo.

Today, we had the Software Center reporting workshop, where we talked about software development and how it will look like in the future. The picture above shows how important the software is in the current truck.

In his keynote, our colleague showed how to design software in the large scale, when the commodity is important, but innovation is what shines out; in the world, where the platforms are important, but do not get the attention that they need.

This kind of approach means that you must be able to grasp both. One must design the software to meet all kinds of features that are relvant today and may be relevant tomorrow. When I see this, I think about ChatGPT, where the platform is the ChatGPT model that allows us to create own GPT-s based on that platform.

This also reminds me about platforms like Ollama or Torch, which allow us to build products fasts and customized to our needs. We can grab models, share them, train them, and (for a small fee) we can even deploy models based on this platform.

Back from the hacked…

So, the blog was out for a while. Turned out that the web server that hosted the website was hacked. You could find it ironic, I find it annoying. Here is why.

First of all, we, as a university, outsource this kind of tech to other actors. It makes no sense to build competence about maintaining web servers locally. Yes, we do have the main website, but we should focus on research, education and outreach. So, we trust the partners that they know what they are doing. Turns out this may not always be the case.

Second, this shows that no one is immune any more. The recent attacks on Primula show that this becomes an increased problem (Inga personuppgifter läckte i hackerattacken (di.se)).

In the work of my team, we try to ensure that these attacks are harder to perform. We create methods and tools that allow to check if the software is secure or not — see this docker container: miroslawstaron/ccsat – Docker Image | Docker Hub. You can use these kind of tools to check if the software that YOU construct is secure, but you can never really be sure about the entire supply chain. Your software may be secure, use MFA and other mechanisms, but if your supplier is vulnerable – not much you can do.

So, with this words of advice – stay safe and keep back-ups!

Sketches to models…

Image by 127071 from Pixabay

https://www.computer.org/csdl/proceedings-article/models/2023/248000a173/1SOLExN0XaU

It’s been a while since I worked with models and I looked a bit at how things have evolved. As I remember, one of the major problems with modelling was one of its broken promises – simplicity.

The whole idea with modelling was to be able to sketch things, discuss candidate solutions and then to transfer them on paper. However, in practice, this never worked like that – the sheer process to transfer a solution from the whiteboard to a computer took time. Maybe even so much time that it was not really worth the effort of informal sketches.

Now, we have CNNs and all kinds of ML algorithms, so why not use that? This paper studies exactly this.

The paper “SkeMo: Sketch Modeling for Real-Time Model Component Generation” by Alisha Sharma Chapai and Eric J. Rapos, presents an approach for automated and real-time model component generation from sketches. The approach is based on a convolutional neural network which can classify the sketches into model components, which is integrated into a web-based model editor, supporting a touch interface. The tool SkeMo has been validated by both calculating the accuracy of the classifier (the convolutional neural network) and through a user study with human participants. At the moment, the tool supports classes and their properties (including methods and attributes) and relationships between them. The prototype also allows updating models via non-sketch interactions with models. During the evaluation the classifier performed with an average precision of over 97%. The user study indicated the average accuracy of 94%, with the maximum accuracy for six subjects of 100%. This study shows how we can successfully employ machine learning into the process of modeling to make it more natural and agile for the users.

Modelling digital twins…

Image by 652234 from Pixabay

https://www.computer.org/csdl/proceedings-article/models/2023/248000a013/1SOLEPphpHa

Digital twins are becoming increasingly important. They provide a possibility to monitor their real twin without the need for costly measurements and sending technicians to the site where the real twin is located. However, development of them is not so easy and is almost one-off for every twin pair.

The paper “A Model-driven Approach for Knowledge-based Engineering of Industrial Digital Twins” presents a new approach to constructing digital twins for factories. Authored by Sushant Vale, Sreedhar Reddy, Sivakumar Subramanian, Subhrojyoti Roy Chaudhuri, Sri Harsha Nistala, Anirudh Deodhar, and Venkataramana Runkana, it introduces a method that enhances efficiency of monitoring and predictive maintenance of industrial plants.

Typically, digital twins are created manually for each plant, which is a labor-intensive process. This paper proposes a model-driven method, structured on three levels of abstraction: the meta-level, plant-type level, and plant-instance level. The meta-level outlines universal structures and vocabulary, the plant-type level focuses on knowledge specific to various plant types, and the plant-instance level details a digital twin for a specific plant. These levels correspond to different user roles: platform builders, plant type experts, and plant experts, respectively. This hierarchical structure enables element reuse across different plants and types, streamlining the digital twin development process. The effectiveness of this method is exemplified in a case study of an iron ore sinter plant.

The process begins with establishing high-level Key Performance Indicators (KPIs) such as sinter throughput or reduction degradation index. These KPIs are then translated into a mathematical model, followed by a causal graph, and finally, a digital twin design/model. Remarkably, this approach significantly reduced the time required to formulate the quality optimization problem to approximately one week, down from two months, marking a substantial improvement in efficiency. In conclusion, this paper demonstrates the substantial advantages of a multi-level modeling approach in designing digital twins, offering a more efficient, standardized, and scalable solution.

Generating documentation from notebooks

https://github.com/jyothivedurada/jyothivedurada.github.io/blob/main/papers/Cell2Doc.pdf

Understanding code is the same regardless if it is in a Jupyter notebook or if it is in another editor. Comments and documentation is the key. I try to teach that to my students and, some of them at least, appreciate it. Here is a paper that can change this to the better without adding to more effort.

This paper introduces a machine learning pipeline that automatically generates documentation for Python code cells in data science notebooks. Here’s a more casual summary of what they did and found:

  1. The Solution – Cell2Doc: The team whipped up a new tool called Cell2Doc. It’s a smart pipeline that breaks down code cells into logical parts and documents each bit separately. This way, it gets more details and can explain complex code better than other tools.
  2. How It Works: Cell2Doc has two main parts. First, a Code Segmentation Model (CoSEG) chops up the code into chunks that make sense on their own. Then, a Code Documentation Model (CoDoc) writes up explanations for each chunk. In the end, you get a full set of docs that covers everything the code is doing.
  3. The Cool Part: This isn’t just about slapping together existing models. Cell2Doc actually makes them better at writing docs for code. It’s like giving a turbo boost to the models so they can catch more details and write clearer explanations.
  4. Testing It Out: They didn’t just build this and hope for the best. They tested it with real data from Kaggle, a place where data scientists hang out and compete. They even made a new dataset for this kind of task because the old ones weren’t cutting it.
  5. The Results: When they put Cell2Doc to the test, it did a bang-up job. It scored way higher on automated tests than other methods, and real humans liked it better too. It was better at being correct, informative, and easy to read.
  6. Sharing Is Caring: They’re not keeping this to themselves. They’ve shared Cell2Doc so anyone can use it to make their code easier to understand.

In a nutshell, Cell2Doc is like a super-smart assistant that takes the headache out of writing docs for your code. It understands the code deeply and explains it in a way that’s easy to get, which is pretty awesome for keeping things clear and making sure your work can be used by others.

I consider to give this tool to my students in the sping when they learn how to program embedded systems in C.

Log files and anomalies, once again…

https://arxiv.org/pdf/2308.09324.pdf

I’ve written about log files a while back, but I think I’m getting hooked up onto the topic. It is actually quite interesting how to use it in practice. So, here is one more paper from the ASE 2023 conference.

This paper presents a new way to create log data that can help spot problems in software systems. Here’s a more casual rundown of what the paper is about:

  1. The Problem: Keeping software reliable is tough, especially when you don’t have enough good examples of system logs to train your anomaly detection tools. The logs you can get your hands on either have privacy issues or are too simple and don’t reflect real-world complexity.
  2. The Solution – AutoLog: The researchers have cooked up AutoLog, a clever method that doesn’t need to run the actual system to generate logs. It’s like a simulation game that creates realistic log data by analyzing the code of an application.
  3. How AutoLog Rolls: It works in three steps. First, it digs through the code to find all the spots where logs might happen. Then, it figures out which parts of the code could lead to those logs. Finally, it walks through these paths, creating log data that looks like it came from a real running system.
  4. The Cool Bits: AutoLog can make a lot more log events than other methods, and it does it super fast. It’s like having a log event factory that can churn out thousands of messages a minute.
  5. Flexibility for the Win: You can tweak AutoLog to simulate different scenarios, like changing the amount of data, the mix of normal and weird events, or focusing on specific parts of the system.
  6. Real-World Ready: When tested on 50 Java projects, AutoLog’s logs helped anomaly detection tools perform a bit better. It’s like giving a detective better clues to solve a case.
  7. Sharing is Caring: The team has shared AutoLog for others to use, hoping it’ll help make software more reliable by giving developers better tools for testing and benchmarking.

In short, AutoLog is a new tool for creating fake but realistic logs that can help find bugs in software without the need to mess with privacy or oversimplified data. It’s a game-changer for making sure software runs smoothly.

I need to take this tool for a spin during the upcoming break.