Cybersecurity has been, and will always be, a challenge for software systems. It is also perceived as an art when it comes to security analysis (or exploitation for that matter). There is no single tool, no single method that will make our software secure.
This article is interesting because of the way that it works. Usually, security analyzers are token-based analyzers which see programs as a set of instructions. They are very good, but they struggle with understanding the context of the analyzed program.
Let me give you an example. We’re analyzing a program for SQL injections – a very simple vulnerability. We can check that the SQL statement in the code contains any parameters. If it does not, then it’s safe – we know what we do with the database, but it’s not very common (or even useful). So, most statements will have some sort of parameters, and this is where the tricky part is. These parameters need to be validated, but this validation can be done in the same function (just before the actual SQL statement) or it can be done somewhere in the calling function/method. The check in the calling function/method is the part where token-based security analyzers give up.
I’ve written about programming language models before, and it is no secret that I am very much into this topic. I like the way in which software engineering evolves – we become a more mature discipline and our tools become smarter by the hour (at least that’s how it feels).
This paper presents a new language model that is capable of doing code edits, i.e., such things as bug fixes. The model is essentially a transformer with an architecture that has been published before. However, the strength of this model is in the way in which it is trained. It uses so-called edit plans to train the model to change the input code, rather than to complement it.
The difference may not sound like much, but it is significant. The existing models are trained to complete code sequences and therefore they are very good in generating code. However, when given a code that does not require any generation, they tend to copy the input sequence to the output sequence. Well, not very useful that is.
Thanks to this new way of training, the model is able to edit code, remove defects, address review comments and so on. Yes, address review comments, this is not a joke. I sincerely believe that we can use this in practice in our tools one day.
One of the most prominent problems with using research results in practice is the lack of replication packages, but this is far from being the only one. Another one, maybe an equally important problem, is the fact that the studies report performance in many different ways.
The paper that I wish to bring up is trying to address a similar aspect of software engineering. The paper reviews existing studies that provide recommendations, e.g., to report confusion matrices or to report statistical significance tests. Then it reviews some of the papers published in respected venues and then it provides actionable guidelines on how to evaluate the performance of machine learning models.