Sources#
Summary#
As AI absorbs the execution of AI development (AI Accelerating AI Development), the human role contracts toward a residue the essay When AI builds itself calls research taste and judgment: "choosing which problems matter, which results to trust, and when an approach is a dead end." This is the top rung of the autonomy ladder and the single capability that, if it stays human, keeps recursive self-improvement from closing the loop. Whether it stays human is the essay's load-bearing uncertainty.
The narrowing role#
The essay's clearest statement of the dynamic: "The human role is narrowing at each step in the AI development process." Two concrete narrowings:
- Writing → reviewing. "Once human- and AI-authored code quality reach parity, humans will stop writing code entirely, and shift to only reviewing it." (Parity is reported as roughly now; see AI Accelerating AI Development.) This is the harness-shrinkage story told from the human side.
- Running → choosing experiments. "Once Claude can run experiments, the question shifts towards 'Which of these experiments is worth running?'" The doing — writing the code, running the experiment, producing the result — "now costs almost nothing in human time, even if it still has costs in compute."
The residual human comparative advantage, "for now," is taste: deciding what's worth the compute. Put in the terms of Compute Allocator, the human becomes a compute allocator at the level of an entire research program rather than a single invocation.
Why it might not stay human — "just another capability"#
The essay refuses to treat research taste as a permanent human moat. Two arguments push against it:
- Perspiration is automatable, and that's most of the work. Genius is "1% inspiration and 99% perspiration"; the 99% — scale it up, see what breaks, fix it, retry — is exactly what Claude excels at. Large-scale research progress "is mostly a function of tools and resources." (See The Bitter Lesson, Recursive Self-Improvement.)
- Taste shows the same capability curve as everything else. Early evidence of improving research judgment — Claude beating the human next-step choice 51%→64% on hard detour moments (AI Accelerating AI Development) — suggests "research taste might be just another AI capability that AI systems fail at for a time, then get good at." The precedent: AI was once bad at, then good at, explaining why a joke is funny, demonstrating theory of mind, and solving linguistic riddles — qualitative skills assumed to be human-shaped. This is the optimistic face of Jagged Intelligence (Ghosts, Not Animals): today's valley is tomorrow's peak.
The honest counter, in an Anthropic employee's words: "The comparative advantage of humans as of right now is still in seeing the bigger picture and thinking beyond the confines of the immediate task." The question is whether "for right now" is a stable equilibrium or a receding frontier.
The bottleneck either way#
Even if taste stays human, it becomes the binding constraint — the Amdahl's-law bottleneck of the whole pipeline. If humans can't review code as fast as Claude generates it, "human review will become the bottleneck to AI development." And if humans spend most of their time on the single-digit fraction of work that is direction-setting, each human steers vastly more work — so the quality and throughput of human judgment becomes the scarce resource, exactly as Verification as the New Bottleneck predicts for verification and the HBR accountability-redesign work predicts for oversight. The risk is that the human nominally decides but actually rubber-stamps (the "not close to substituting for senior researchers" judgment quietly eroding into a formality).
The human cost (the quieter thread)#
The essay includes unusually candid employee quotes about what the narrowing feels like — worth recording because they name a cost the productivity charts don't:
- The collapse of the gift economy of small favors: asking a colleague "can you help me get this script running?" "created a little debt, a little mutual awareness. [Claude is] faster, it creates zero debt, but each of these is a lost bid for human collaboration."
- The vertigo of contingent relevance: "On days where everything works well, I can't help but think nothing I do matters … But then there are days where everything breaks and I don't understand why and I realize I have no idea what I've been up to anymore."
Connections#
- Recursive Self-Improvement — whether taste stays human decides which of the three futures obtains; the essay's central uncertainty
- AI Accelerating AI Development — the evidence that execution is already absorbed, leaving taste as the residue
- AI R&D Autonomy Evaluation (AECI) — "not close to substituting for senior Research Scientists/Engineers" is the formal version of "taste is still human"
- Jagged Intelligence (Ghosts, Not Animals) — the joke/theory-of-mind precedent for "a capability AI fails at, then masters"; taste as the current valley
- Autonomous Scientific Discovery — Mythos 5's autonomous hypothesis-generation (~80% preferred over Opus-class) and "only high-level human input" genomics are concrete chips at the taste moat
- Verification as the New Bottleneck — if taste/review can't keep pace with generation, judgment becomes the binding constraint
- Harness Shrinkage as Models Improve — the same role-narrowing dynamic from the harness side; what's left after the model-facing harness dissolves
- Compute Allocator — taste exercised as allocation: deciding which experiments are worth the compute
- The Bitter Lesson — "research progress is mostly tools and resources" is the bitter lesson aimed at taste itself
Open questions#
- Is research taste a genuine ceiling (an architectural capability scaling can't reach) or the next jagged valley to fill? The essay calls this the decisive unknown.
- If taste is automatable, what — if anything — remains a durable human comparative advantage in AI development?
- How do you measure rubber-stamping? "Humans set direction" can be true on paper while real judgment quietly transfers to the model.
Sources#
- When AI builds itself — §"What might the future of work at Anthropic look like?" and §"What if we're wrong?"
Cited by 11
- AI Accelerating AI Development
The empirical core of *When AI builds itself*: measured evidence AI already speeds AI R&D at Anthropic — >80% of merged…
- AI R&D Autonomy Evaluation (AECI)
How Anthropic measures whether a model can automate or dramatically accelerate AI research — the capability that drives…
- Autonomous Scientific Discovery
Mythos-class models now conduct novel science with limited human input — autonomous protein/drug design (~10× faster, m…
- Compute Allocator
The human's evolving role: deciding what's worth spending compute on; ~1% of generated tokens ship, 99% is scaffolding…
- Harness Shrinkage as Models Improve
Prompt scaffolding shrinks each model release; Cat Wu's pruning discipline; Boris Cherny "100 lines of code a year from…
- Jagged Intelligence (Ghosts, Not Animals)
"Ghosts not animals": jagged statistical circuits, no intrinsic motivation; car-wash/strawberry failures; stay in the l…
- Governance & Workforce
Map of Content for the governance-workforce domain — 11 concepts. Curated entry point; see Home for all domains.
- Open Questions Backlog
_96 pages with open questions, as of 2026-06-14._
- Recursive Self-Improvement
An AI system autonomously designing and developing its own successor; Anthropic Institute's *When AI builds itself* arg…
- The Bitter Lesson
Sutton 2019: scaled general methods beat hand-engineered structure; recurring justification across the wiki for dissolv…
- Verification as the New Bottleneck
Fiona Fung: coding is no longer the bottleneck — verification, review, maintenance are; shift-left; TDD loses its tax;…
Related articles
- Recursive Self-Improvement
An AI system autonomously designing and developing its own successor; Anthropic Institute's *When AI builds itself* arg…
- Task Time-Horizon Scaling
METR's measure of the task length AI can complete reliably on its own, doubling roughly every 4 months (up from every 7…
- AI Accelerating AI Development
The empirical core of *When AI builds itself*: measured evidence AI already speeds AI R&D at Anthropic — >80% of merged…
- Harness Shrinkage as Models Improve
Prompt scaffolding shrinks each model release; Cat Wu's pruning discipline; Boris Cherny "100 lines of code a year from…
- AI R&D Autonomy Evaluation (AECI)
How Anthropic measures whether a model can automate or dramatically accelerate AI research — the capability that drives…
