Agentic code review: AI hunts the bugs. You still own the merge.

Anthropic's Code Review feature runs a fleet of agents against every pull request in parallel. Each agent targets a different class of bug. A verification step filters false positives before anything reaches the engineer. Findings post as inline comments on the exact lines where issues were found.

The numbers from Anthropic's internal rollout:

54% of PRs now get findings, up from 16% with previous methods
Large PRs average 7.5 real issues caught
Less than 1% of findings marked incorrect by engineers

That's a meaningful signal-to-noise ratio for an automated system. Most automated review tools are abandoned because the false positive rate is too high to trust. Less than 1% incorrect is a different category of tool.

What it actually does

This is an agentic pattern in practice: parallel specialization, then consolidation. Each agent hunts one class of problem — security, logic errors, type safety, edge cases, whatever you configure. They run simultaneously. Results merge. A verification layer filters what survives before it surfaces to the engineer.

One concrete case from Anthropic's rollout: a developer made a small code edit that would have silently broken authentication. The agents caught it before merge. Small change, serious consequence — exactly the category that slips through because it's too subtle to flag in a quick human skim.

The review runs in about 20 minutes. Cost is $15–25 per PR, scaling with size. Currently available as a research preview.

The REVIEW.md file

You control what gets flagged through a REVIEW.md file in your repo. Define what to always check. What to skip. What conventions matter for this specific codebase.

This is not a configuration detail. It is the structured handoff between human judgment and agent execution.

Your accumulated knowledge of the codebase — what matters, what doesn't, what the team has already decided and why — becomes machine-readable instruction. The agents work within the frame you set. They do not invent standards. They apply yours.

Without that file, you're delegating without a brief. With it, you're orchestrating.

The REVIEW.md is where experience goes from tacit to explicit. Writing it well is a skill — the same skill that makes a good code review checklist, a good runbook, or a good architecture decision record. The quality of the agent output is bounded by the quality of the frame you give it.

What doesn't change

The agents find issues. Engineers decide what to do with them. That accountability doesn't transfer to the tool.

This is the same principle that applies across AI-assisted development. More coverage doesn't mean less ownership. It means the bar for what you missed is higher — because the system already caught the things it was configured to catch.

54% of PRs get findings. Someone still has to read those findings, understand them in context, and decide whether they represent real risk. That requires domain knowledge. It requires the same judgment that made a good code reviewer before agents existed.

The surface area of what you're accountable for changes. The accountability itself doesn't.

The pattern worth understanding

Regardless of whether you adopt this specific tool, the architecture is worth studying:

Parallel agents, each specialized to one task class
A verification step to filter noise before it reaches humans
Human-readable output at the exact decision point
A configuration file where human judgment is encoded upfront

This is what agentic systems should look like when applied to workflows with real consequences. The agents amplify the review, not replace it. The REVIEW.md is the brief. The agents execute. The engineer signs off.

The authentication bug would have shipped. It didn't. That is the point — and the limit.

Source: Claude Code Review documentation

Was this helpful?