How good are language models for source code tasks?

Using machine learning, and deep learning in particular, for software engineering tasks exploded recently. I would say that it exploded a bit too much. I’m myself to blame here as our team was one of the early adopters with the CCFlex model and source code analysis.

Well, this paper compares a number of modern deep learning models, so called transformers, in various code and comment analysis tasks. The authors did a great job in collecting a set of models and datasets, trained them and critically evaluated the performance.

I recommend reading the entire paper, but what they found was a bit surprised for me. First of all, they found that the transformer models are better for the natural language and not so great for the source code analysis. The hypothesis is that the structure of programs is important here. They have also found that pre-training is important, but not crucial. Pre-training attributes to a moderate effect in the end. The dataset, and its content, is much more important for the task at hand.

This is a great paper and I hope that this can become an essential reading for software engineers working with AI systems engineering supporting the software engineering tasks.

Author: Miroslaw Staron

I’m professor in Software Engineering at IT faculty. I usually blog about interesting articles (for me) and my own reflections on the development of Software Engineering, AI, computer science and automotive software.