METR Redesigns AI Productivity Study — The Gap Between Feeling Fast and Being Fast

What Happened

METR (Model Evaluation & Threat Research) has announced a major redesign of their AI developer productivity experiment. Their 2025 study produced a shocking result: experienced developers using AI coding tools were 20% slower than without them. Now, their 2026 replication with newer tools (Claude Code, Cursor with subagents, Codex) produced data plagued by selection bias — developers self-selected into tasks they preferred, making the results unreliable.

Meanwhile, industry data tells a different story on adoption: AI-generated code now accounts for 26.9% of all production code (up from 22% last quarter), and 93% of developers report using AI tools regularly.

Why This Matters

The Perception-Reality Gap

This is the most important finding in the study: developers consistently overestimate AI’s speed benefit. In controlled experiments, tasks with AI assistance frequently took longer due to prompt engineering overhead, reviewing generated code, debugging hallucinated APIs, and context-switching between the editor and AI tool.

Yet those same developers reported feeling more productive. The likely explanation: AI makes the unpleasant parts of coding (boilerplate, configuration, test setup) feel less tedious, even if the total elapsed time doesn’t decrease.

26.9% AI Code Is a Tipping Point

Regardless of speed, the volume of AI-written production code is accelerating. At 26.9%, more than one in four lines of code in production wasn’t written by a human. This has implications for:

Code review: Are teams reviewing AI code with the same rigor as human code?
Technical debt: AI-generated code tends toward verbose, over-engineered solutions. Is this accumulating silently?
Accountability: When AI code causes a production incident, the debugging experience is fundamentally different.

Why the Study Redesign Matters

METR is one of the few organizations doing rigorous, controlled experiments on AI productivity. Their willingness to publish negative results and acknowledge methodological problems is scientifically valuable. The industry narrative of “AI makes developers 10x faster” needs this kind of reality check.

What You Can Do

Measure your own baseline — track time-to-merge for similar tasks with and without AI assistance. Personal data beats industry averages.
Audit your AI code ratio — check what percentage of your recent PRs were AI-generated. If it’s above 30%, ensure your review process accounts for AI-specific failure modes.
Identify where AI genuinely saves time versus where it just feels better. For most developers, AI excels at test generation, boilerplate, and documentation — not core business logic.

Source

METR — We are Changing our Developer Productivity Experiment Design