METR's February 2026 Update: Developers Are Still 19% Slower With AI — and 7 Tools Make It Worse

METR published an updated version of its AI productivity research in February 2026. The headline finding hasn’t changed: experienced developers using AI tools complete tasks 19% slower than matched developers without AI access, while simultaneously reporting feeling 20% faster.

The new data from ActivTrak’s 2026 Workplace State Report adds a structural dimension: focus efficiency across organizations has fallen to 60% — a three-year low. The average organization now runs seven or more AI tools simultaneously. The data shows a threshold effect: productivity declines when teams use more than three AI tools concurrently.

The Measurement Problem Is Still Unsolved

METR’s experimental design remains a randomized controlled trial — the same methodology used in drug efficacy research. Participants are randomly assigned to AI-enabled or AI-disabled conditions and asked to complete real coding tasks.

The challenge METR continues to acknowledge: self-selection bias in who agrees to participate in controlled productivity experiments may skew toward developers who are skeptical of AI tools. This would pull the measured outcome toward negative.

METR’s response to this concern is instructive: rather than claiming the 19% result is definitive, they’re redesigning the experiment to address the selection problem. The February update is not “AI definitively makes developers slower.” It’s “our current best measurement says slower, and we’re working to validate or refute that.”

The discomfort for the industry is that we don’t have a more rigorous measurement that says otherwise.

The ActivTrak Data Is a Different Kind of Finding

METR measures task completion in controlled conditions. ActivTrak measures real-world focus patterns across organizations. They’re different signal types, and the ActivTrak finding is arguably more alarming:

60% focus efficiency — the lowest measured in three years
7+ AI tools is now the organizational average
3-tool threshold: organizations using more than 3 AI tools concurrently show declining productivity

The 7-tool average represents a predictable adoption pattern. Each AI tool was added because someone demonstrated it solved a specific problem. Nobody added them all at once, and nobody measured the systemic overhead of switching, prompt-crafting, output-reviewing, and context-switching across all of them.

This is the AI productivity paradox at the organizational level: each tool looks beneficial in isolation, but the portfolio of tools creates coordination cost that exceeds the individual gains.

Why Experienced Developers Are More Susceptible

METR’s finding is specifically about experienced developers — not beginners. This is the counterintuitive result that gets less attention than it deserves.

Beginner developers using AI tools do show productivity improvements. The tools compensate for knowledge gaps in syntax, API usage, and common patterns. AI assistance is net-positive when it replaces lookup time and fills knowledge voids.

Experienced developers are a different case. Their bottleneck isn’t knowledge — it’s judgment and architecture. For them, AI tools introduce a verification burden: every AI-generated output requires assessment against mental models that the developer already holds. The experienced developer can’t skip the review step the way a beginner might; they know too much to accept output uncritically.

The AI tool generates faster. The experienced developer then reviews the output, which takes time proportional to the complexity of the generation. The net result: slower, not faster.

The Practical Answer: Tool Reduction

The ActivTrak three-tool threshold suggests a practical intervention that requires no new AI capability: reduce the number of AI tools your team uses.

This is uncomfortable because it runs counter to the instinct to adopt every tool that promises productivity gains. But the systemic data says:

One tool used deeply → productivity gain
Three tools used adequately → neutral
Seven tools used shallowly → productivity loss

For developers using Claude Code: this is the case for focusing there rather than distributing attention across multiple AI coding assistants. The marginal benefit of adding a second AI coding tool is likely negative when measured against the coordination overhead.

The METR finding doesn’t say AI doesn’t work. It says we’re measuring productivity incorrectly and that current tool portfolios are creating overhead that cancels out gains. Both are solvable problems — but only if we’re honest about the measurement.

Source: METR — AI Productivity Uplift Update, February 2026

METR's February 2026 Update: Developers Are Still 19% Slower With AI — and 7 Tools Make It Worse

The Measurement Problem Is Still Unsolved

The ActivTrak Data Is a Different Kind of Finding

Why Experienced Developers Are More Susceptible

The Practical Answer: Tool Reduction

Comments