Investigating Large Language Models for Code Generation on GitHub (article review)

https://arxiv.org/abs/2406.19544

Again, on the topic of generative AI for programming. I’ve found this interesting article that reviewed the state of the adoption. It examines the use of large language models (LLMs) like ChatGPT and GitHub Copilot in software development. In short, they find that:

  1. ChatGPT and Copilot dominate code generation on GitHub, primarily for small projects led by individuals or small teams.
  2. These tools are mainly used for Python, Java, and TypeScript, generating short, low-complexity code snippets.
  3. Projects with LLM-generated code evolve continuously but exhibit fewer bug-related modifications.

So, although so many LLMs exist, it is still ChatGPT and CoPilot that have the largest share of the market. IMHO this is because of the ecosystem. It’s not enough to have an LLM, but we need to be able to access internet, interact with the model and also get it to be trained using our examples.

Author: Miroslaw Staron

I’m professor in Software Engineering at IT faculty. I usually blog about interesting articles (for me) and my own reflections on the development of Software Engineering, AI, computer science and automotive software.