AI risk and benefit evaluation company METR conducted a randomized control test on experienced open source developers to analyze the impact of LLMs on productivity.
The study found that using LLM-based tools like Cursor Pro with Claude 3.5/3.7 Sonnet reduced productivity by approximately 19%.
The methodology focused on realistic scenarios, such as adding features, bug fixes, and refactoring tasks on code, mirroring developers' work on open source projects.
The study identified factors contributing to decreased productivity with LLMs, including over-optimism, interference with existing knowledge, poor performance on large codebases, and low reliability in code generation.