The book explains how these models work for natural language processing, but making it work for source code is trivial. Use your code instead of the provided text and there you go. You need a GPU or use some cloud service, otherwise you will wait forever.
But if you have it, you can get really cool results within a day or two.
So, the holidays are over, a new year starts, new resolutions are made, new projects started. But before we get all stuck in the work, I’d like to share a book suggestion to read on the go.
This book is about the current trends in the modern world. It discusses such aspects as the our dependency on technology, the way we use it to produce food and to make things. It talks about how the current supply chains get disrupted and what we need to do to maintain/regain the balance.
Finally, it talks about the energy, our dependence on the oil energy and on the nuclear power.
However, this is not a doomsday book, quite a contrary. It is a book about the hope in the development of the modern society and how we should contribute to it. I strongly recommend this book as a reading for the evenings, after hearing about the energy crisis. I recommend to take this book in and reflect on the fact that we have achieved a lot and the world is not as scary as the news want it to be, or create it to be.
I hope that you will enjoy the book as much as I do.
Understanding programming language is an important topic in research in the area of programming language models. I’ve written before that there are ca. 50 programming language models, which we can use in software engineering. Ok, not all of them are equivalent and they are specific to the task, but they are available, so we can use and customize them.
This article is a study done by our colleagues from the department. It’s too long to quote in detail, but there are a few things that I like. First, it’s a good overview of the types of language models:
Tree-based representation: when the program code is seen from the perspective of their Abstract-Syntax-Tree, an example is the code2vec model: code2vec
Graph-based models: when the program code is seen as a directed graph, e.g., a control flow graph
Although I like this classification, I see that it misses one of the most prominent and the most popular one – the NLP based model. It is a type of model where the program code is seen as a set of sentences that have meaning of some sort. It is a derivative of the token-based representation, but it is much more than that. CodeX from OpenAI is an example of such model.
Nevertheless, this study provides a very interesting set of examples of models and their applications. I sincerelly suggest to take a look at this paper to understand how the models work and where they are used best.
Some of you may not know, but I started my career as a software tester, so I’ve done my share of defect tracking and fixing. Although it was a while ago (well, over 20 years ago to be frank), I still remember a thing or two. I guess it is like riding a bike. One thing that I remember is that we did not really need more tests, but smarter testing.
This paper, nevertheless, proposes a new type of testing – inline testing – which is supposed to replace using printf(…) in code. Instead of printing values of variables for debugging purposes, we can use the new framework to create such small inline tests and execute them. The idea is simple and contributes to the maturity of our discipline.
By using inline tests, we can track the progress of our software development and its quality evolution. Since we can generate reports and use asserts, we could communicate our progress to quality management in a much better way.
I need to test this framework, especially that it works with Python, my new language of choice…
Cybersecurity has been, and will always be, a challenge for software systems. It is also perceived as an art when it comes to security analysis (or exploitation for that matter). There is no single tool, no single method that will make our software secure.
This article is interesting because of the way that it works. Usually, security analyzers are token-based analyzers which see programs as a set of instructions. They are very good, but they struggle with understanding the context of the analyzed program.
Let me give you an example. We’re analyzing a program for SQL injections – a very simple vulnerability. We can check that the SQL statement in the code contains any parameters. If it does not, then it’s safe – we know what we do with the database, but it’s not very common (or even useful). So, most statements will have some sort of parameters, and this is where the tricky part is. These parameters need to be validated, but this validation can be done in the same function (just before the actual SQL statement) or it can be done somewhere in the calling function/method. The check in the calling function/method is the part where token-based security analyzers give up.
I’ve written about programming language models before, and it is no secret that I am very much into this topic. I like the way in which software engineering evolves – we become a more mature discipline and our tools become smarter by the hour (at least that’s how it feels).
This paper presents a new language model that is capable of doing code edits, i.e., such things as bug fixes. The model is essentially a transformer with an architecture that has been published before. However, the strength of this model is in the way in which it is trained. It uses so-called edit plans to train the model to change the input code, rather than to complement it.
The difference may not sound like much, but it is significant. The existing models are trained to complete code sequences and therefore they are very good in generating code. However, when given a code that does not require any generation, they tend to copy the input sequence to the output sequence. Well, not very useful that is.
Thanks to this new way of training, the model is able to edit code, remove defects, address review comments and so on. Yes, address review comments, this is not a joke. I sincerely believe that we can use this in practice in our tools one day.
Language models are powerful tools if you know how to use them. One of the areas where they can be used in recognizing security vulnerabilities. In this article, the authors look into six language models and test them.
The results show that there are more challenges than solutions in this area. The models can be applied to languages, but the problem is with the examples and the ground truth. What is good about the paper is that it provides a good overview of the models and how they are used. They also look a bit deeper on why the limitations of the models happen.
It’s something that our team has also observed in other context, but I will talk about that in some other event. Stay tuned.
After my last post, and the visit to the workshop at MDU, I realized that there are a few tools that can be used automatically already now. So, this paper presents one of them.
What is interesting about this tool is that it uses github workflows, so it’s compatible with many modern CI/CD pipelines. The tool analyzes worflows and looks for security vulnerabilities. For example, if you keep sensitive information in a plain text file that is used in the workflow (secrets), or checks if the workflow enforces the “least privilege” principle.
So I find myself on the train again, this time strolling towards MDU for their cybersecurity workshop. Not that I am an expert on just cybersecurity, but I know a bit about programming and design. I also know this much to see that a secure product needs to start designing for security, not only testing for it.
I stumbled upon this paper about a week ago, probably as it has been submitted to some conference and the pre-print became available. It is a paper that interviews 10 developers and surveys over 180 professionals about how they work with finding security vulnerabilities during code reviews. I will not describe the entire article, although I wish I had the time to do that. Here are some of the highlights.
“Interviewees stated to disregard security aspects during code reviews due to their assumptions about the security dynamic of the application they develop. ” – this is an interesting finding, as many companies see the code reviews as a golden bullet of software quality assurance today. Yet, the developers do not review something they thing “someone else” does…
When it comes to the survey, the results show that the majority of software developers think about security during their code reviews. The majority of the developers admit that there is no security experts reviewing their code, which is probably not great. Maybe we should have some of the security experts do some code reviews? Maybe both the developers and the security specialists would learn something from one another?
Finally, I think that the survey puts a finger on one of the pain points in modern companies – support for specific aspects of code reviews. They would like to see more support for the developers for making better security evaluations. I could only speculate that this is about in-depth training.
Well, very interesting reading. Let me get back to the paper, looking at the beautiful landscapes of Östergötland….
Code reviews are time consuming. And effort intensive. And boring. And needed. Depending whom we ask, we get one of the above answers (well, 80% of the time). The reality is that the code reviews are not the most productive activity. Reading the code and looking for defects is good when we do it once, but when we need to work with it during continuous integration, the story changes. It becomes like studying for the exam or the homework – we do everything else to postpone it. Then someone waits longer or the code quality suffers.
There has been a lot of work done to make this activity more fun – gamification, automated support, using machine learning to filter out the code that we can automatically check – just to name the few. As far as I know, there has not been much work in understanding of what kind of problems code reviews really find.
In this article, the authors address that very question. Admittedly, they only analyzed 7 OSS projects, but their results are still interesting: “We identified 116 defect types that we grouped into 15 groups to create a defect classification. Additionally, 38% of these defects could be automatically detected accurately. “
So, what the code reviews are good for? Here is their list:
errors, warnings and logging,
logic and functionality
The list is sorted from the least frequent to the most frequent – so logic and functionality is the category where the code reviews are the most useful for. I need to also say that the frequencies are not super-high – threading is only 1 detected concern, while logic and functionality has 57. So, you know, could be more, given how much time is spent on code reviews. I guess it is what the quality costs nowadays, even though there is no real data on this.