On the Use of ChatGPT for Code Review (article review) – SE metrics (Software Engineering)

CoPilot and other tools have been used increasingly often, but mostly for testing and for programming. Now, the questions is whether these kind of tools help much in such tasks as code review.

This paper explores how developers utilize ChatGPT in the code review process and their reactions to the AI-generated feedback. This research analyzed 229 review comments from 205 pull requests across 179 projects to understand the purposes and effectiveness of ChatGPT in code review.

The found that:

Developers primarily use ChatGPT for two main purposes: referencing and outsourcing. These purposes were further categorized into several sub-categories:

Developers used ChatGPT to gain understanding and support their opinions, including tasks such as refactoring, implementation, design, non-programming tasks, testing, documentation, and others.
Developers directly asked ChatGPT to resolve specific issues. This included implementation, refactoring, bug-fixing, reviewing, testing, design, documentation, and other tasks.

The study found a mixed reaction to ChatGPT-generated reviews, which is not really surprised given that it is a new technology and code reviews are not only for review, but also for learning:

Positive Reactions (64%): A majority of the responses were positive, indicating that developers found the AI’s suggestions helpful.
Negative Reactions (30.7%): A significant portion of responses were negative. The primary reasons for dissatisfaction included the solutions not bringing extra benefits, containing bugs, or not aligning with developers’ preferred coding styles.

However, what I find really interesting.

Enhanced Prompt Strategies: Effective use of ChatGPT requires well-crafted prompts to maximize the AI’s potential in generating useful code reviews.
Tool Integration: Integrating ChatGPT with existing development tools can streamline the review process.
Continuous Monitoring: Regular assessment and refinement of ChatGPT’s outputs are necessary to ensure high-quality code reviews.

These three points are kind of cool, because they mean that we need to learn how to instruct and use these tools. That means that we loose something (knowledge about our products), while we need to learn more general skills about prompting…

Author: Miroslaw Staron

I’m professor in Software Engineering at Computer Science and Engineering. I usually blog about interesting articles (for me) and my own reflections on the development of Software Engineering, AI, computer science and automotive software. View all posts by Miroslaw Staron