Verification as the New Bottleneck

Sources#

Running an AI-native engineering org

Summary#

Fiona Fung's central claim from running Claude Code + Cowork engineering: for years, engineering bandwidth was the expensive resource — planning, reviews, and process all existed to protect it. Once agentic coding made coding cheap, the bottleneck moved to verification, review, and maintenance. "On the Claude Code team, coding is really not the slow part anymore." The new scarce resource is confidence that the change is correct — and it gets scarcer precisely because bandwidth (and therefore throughput) exploded.

Why verification is now the constraint#

Three forces converge:

Volume. Bandwidth increased so much that "we have to pay even more attention to: is it correct."
Blurring roles. More people (designers, managers, PMs) now check in changes, so everyone needs confidence their change is correct.
Maintenance cost. Higher throughput means more to maintain — the cost of maintenance becomes a first-class concern, not an afterthought.

This is the org-level mirror of Karpathy's The Verifiability Thesis ("LLMs automate what you can verify") and the demand side of Harness Shrinkage as Models Improve (prompt scaffolding shrinks; mechanical verification stays load-bearing).

TDD loses its tax#

A vivid sign of the shift: TDD used to feel like "eating broccoli" — write the failing test first, verify it fails, then fix. With Claude, Fung found it "so much more fun and pleasurable… it took the tax out of test-driven development." The economics flipped: when writing the test is nearly free, the discipline that grounds verification (a test that provably fails, then passes) is pure upside. (Cf. the tdd / red-green-refactor discipline; the failing-test-first step is the verifier.)

Shift left#

Her recurring phrase: shift left — catch problems closer to the source via automation, not after a customer hits them. "What's better than me running into the bug first? Having automation in place to catch it closer to the source." As throughput rises, the only way verification keeps up is by being automated and early rather than manual and late.

Who reviews — and the human-in-the-loop line#

Before shipping Claude Code's own code-review feature, "how do you keep up with code reviews?" was her most-asked question. The answer: Claude Code review handles style, lint, obvious bugs, and spec-drift (if you check the spec into the codebase, "Claude is very good about verifying against spec drift"). But humans stay in the loop where it matters: legal review, risk tolerance, trust boundaries — "trust but verify, and where humans bring needed expertise." The division of labor: automate the mechanical verification, reserve human judgment for risk and trust-boundary calls. (Cf. Deep Modules for Agents: reviewer in a fresh context.)

Measuring the shift (and a trap)#

Signals she watches: onboarding ramp-up time ↓, PR cycle time ↓, Claude-assisted commits ↑ ("I haven't seen a commit that wasn't Claude-assisted in months"). The trap: don't read end-to-end PR cycle time alone — break it into funnel chunks. If cycle time isn't dropping, it may not be low AI adoption; it could be CI/build systems jamming under the new throughput. And throughput isn't the goal — "find some way to measure whatever you're actually trying to solve," not just velocity.

Connections#

Fiona Fung — author of the thesis
The Verifiability Thesis — Karpathy's "automate what you can verify" is the model-level cause; this is the org-level consequence
Harness Shrinkage as Models Improve — the synthesis it confirms: scaffolding shrinks, mechanical verification doesn't
Evals as Product Spec — Cat Wu's evals are verification encoded as product spec; the PM-side companion
Code as Source of Truth — checking the spec into the repo is what lets Claude verify spec drift
Building Is Cheap, Arguing Is Expensive — the upstream half: generation is cheap, so verification (and judgment) is where cost concentrates
Claude Code Auto Mode — the auto-approve classifier is verification automation at the permission layer
Deep Modules for Agents — reviewer-in-fresh-context is the verification-quality move at the code-review layer
AI Brain Fry — the risk if verification stays manual: oversight fatigue increases errors as volume grows
AI-Driven Formal Proof Search — the extreme case: a compiler as the verifier, so the bottleneck is fully mechanized

Open Questions#

Fung's own open question: "How far do you push fully automated reviews?" — where's the speed/safety balance, and how do you keep humans confident without re-introducing the review bottleneck?
If CI/build is the hidden jam, does verification infrastructure (test runners, CI capacity) become the actual capex of an AI-native org?

Sources#

Running an AI-native engineering org