Plate IISuperintelligence Trajectory中文HOWARDISM

Research Taste as the Human Bottleneck

PublishedJune 7, 2026FiledConceptDomainSuperintelligence TrajectoryTagsGovernance Human AI CollaborationResearchRole Evolution AnthropicReading12 minSourceAI-synthesised

The narrowing human role as AI absorbs execution: choosing which problems matter, which results to trust, and when an approach is a dead end; the top rung of the autonomy ladder, and the open question of whether taste is 'just another capability' AI fails at then masters

Illustration for Research Taste as the Human Bottleneck

Sources#

Summary#

As AI absorbs the execution of AI development (AI Accelerating AI Development), the human role contracts toward a residue the essay When AI builds itself calls research taste and judgment: "choosing which problems matter, which results to trust, and when an approach is a dead end." This is the top rung of the autonomy ladder and the single capability that, if it stays human, keeps recursive self-improvement from closing the loop. Whether it stays human is the essay's load-bearing uncertainty.

The narrowing role#

The essay's clearest statement of the dynamic: "The human role is narrowing at each step in the AI development process." Two concrete narrowings:

Writing → reviewing. "Once human- and AI-authored code quality reach parity, humans will stop writing code entirely, and shift to only reviewing it." (Parity is reported as roughly now; see AI Accelerating AI Development.) This is the harness-shrinkage story told from the human side.
Running → choosing experiments. "Once Claude can run experiments, the question shifts towards 'Which of these experiments is worth running?'" The doing — writing the code, running the experiment, producing the result — "now costs almost nothing in human time, even if it still has costs in compute."

The residual human comparative advantage, "for now," is taste: deciding what's worth the compute. Put in the terms of Compute Allocator, the human becomes a compute allocator at the level of an entire research program rather than a single invocation.

Why it might not stay human — "just another capability"#

The essay refuses to treat research taste as a permanent human moat. Two arguments push against it:

Perspiration is automatable, and that's most of the work. Genius is "1% inspiration and 99% perspiration"; the 99% — scale it up, see what breaks, fix it, retry — is exactly what Claude excels at. Large-scale research progress "is mostly a function of tools and resources." (See The Bitter Lesson, Recursive Self-Improvement.)
Taste shows the same capability curve as everything else. Early evidence of improving research judgment — Claude beating the human next-step choice 51%→64% on hard detour moments (AI Accelerating AI Development) — suggests "research taste might be just another AI capability that AI systems fail at for a time, then get good at." The precedent: AI was once bad at, then good at, explaining why a joke is funny, demonstrating theory of mind, and solving linguistic riddles — qualitative skills assumed to be human-shaped. This is the optimistic face of Jagged Intelligence (Ghosts, Not Animals): today's valley is tomorrow's peak.

A frontier researcher's practitioner reading lands on the "just another capability" side. Noam Brown (OpenAI, practitioner-opinion) reports that models will optimize the algorithms from his poker research 10–100× but cannot yet invent a better one — "I can give it a lot of time and it's still not able to do it" — so today's models are "a very good complement to researchers," "not able to fully replace the whole research cycle." But he explicitly expects this to fall: "I wouldn't be surprised if at some point — same thing with coding, same thing with math — there's just this inflection point where suddenly it's actually good enough… I wouldn't be surprised if we encounter that point for research taste as well." This is the jagged valley argument stated by someone watching the valley from inside a lab.

The honest counter, in an Anthropic employee's words: "The comparative advantage of humans as of right now is still in seeing the bigger picture and thinking beyond the confines of the immediate task." The question is whether "for right now" is a stable equilibrium or a receding frontier.

The third answer: it isn't a capability at all#

Both arguments above — taste as a durable moat, taste as the next jagged valley — presuppose that taste is a capability, something a mind either has or acquires. Andrew Ng rejects the premise (June 2026, practitioner-opinion):

"Many people describe this human contribution as 'taste,' but I prefer to think of it as humans having a context advantage, since that gives us a clearer path to helping AI systems get better… So long as the human knows something the AI does not, human-in-the-loop is needed to inject that knowledge into the system."

Under this reading the residue is not a faculty but an information asymmetry — the human knows the users, the constraints, the history, the thing so obvious nobody wrote it down. That makes the human role falsifiable (does the asymmetry exist?), perishable (it closes as context transfers), and an engineering problem rather than a mystique. It also means the question this page is organized around — ceiling or valley? — may be malformed. See Context Advantage, Not Taste for the full treatment, including where the reframe fails: it explains the deployment asymmetry (I know my users) but not the generative one (The Abstraction Barrier, Transformative Creativity), where the missing thing is a concept, not a fact.

The two frames disagree about what would count as evidence. Returns to Expertise in Agentic Coding is the instrument that distinguishes them: if the expertise premium declines because models acquired judgment, taste was a capability; if it declines because context transfer improved, taste was an asymmetry all along.

The bottleneck either way#

Even if taste stays human, it becomes the binding constraint — the Amdahl's-law bottleneck of the whole pipeline. If humans can't review code as fast as Claude generates it, "human review will become the bottleneck to AI development." And if humans spend most of their time on the single-digit fraction of work that is direction-setting, each human steers vastly more work — so the quality and throughput of human judgment becomes the scarce resource, exactly as Verification as the New Bottleneck predicts for verification and the HBR accountability-redesign work predicts for oversight. The risk is that the human nominally decides but actually rubber-stamps (the "not close to substituting for senior researchers" judgment quietly eroding into a formality).

The human cost (the quieter thread)#

The essay includes unusually candid employee quotes about what the narrowing feels like — worth recording because they name a cost the productivity charts don't:

The collapse of the gift economy of small favors: asking a colleague "can you help me get this script running?" "created a little debt, a little mutual awareness. [Claude is] faster, it creates zero debt, but each of these is a lost bid for human collaboration."
The vertigo of contingent relevance: "On days where everything works well, I can't help but think nothing I do matters … But then there are days where everything breaks and I don't understand why and I realize I have no idea what I've been up to anymore."

Connections#

Recursive Self-Improvement — whether taste stays human decides which of the three futures obtains; the essay's central uncertainty
AI Accelerating AI Development — the evidence that execution is already absorbed, leaving taste as the residue
AI R&D Autonomy Evaluation (AECI) — "not close to substituting for senior Research Scientists/Engineers" is the formal version of "taste is still human"
Jagged Intelligence (Ghosts, Not Animals) — the joke/theory-of-mind precedent for "a capability AI fails at, then masters"; taste as the current valley
Autonomous Scientific Discovery — Mythos 5's autonomous hypothesis-generation (~80% preferred over Opus-class) and "only high-level human input" genomics are concrete chips at the taste moat
Verification as the New Bottleneck — if taste/review can't keep pace with generation, judgment becomes the binding constraint
Harness Shrinkage as Models Improve — the same role-narrowing dynamic from the harness side; what's left after the model-facing harness dissolves
Compute Allocator — taste exercised as allocation: deciding which experiments are worth the compute
The Bitter Lesson — "research progress is mostly tools and resources" is the bitter lesson aimed at taste itself
The Abstraction Barrier — DeepMind's theoretical case that taste/novel-concept-discovery is a real ceiling, not just the next capability to fall: AI may be bounded by human conceptual frameworks
Transformative Creativity — Boden level-3 (creating new conceptual spaces) is taste exercised at the frontier of concept-space; the "Einstein test" version of the moat
Multi-Agent Collective Intelligence — steering large superhuman-speed agent groups (whose output humans can't fully consume) is the form taste takes under the collective pathway
Instrumental Convergence — a knowledge-seeking objective is an alternative framing of judgment as positive-sum information gain rather than scarce human taste
Artificial Superintelligence (ASI) — taste/judgment is the candidate for what (if anything) stays human as systems cross from AGI into ASI
Returns to Expertise in Agentic Coding — the labor-data instance of the same "is taste durable?" question: Anthropic frames a decrease in the returns to domain expertise over time as the signal that "models are starting to supply the essential judgment users currently bring" — taste-as-just-another-capability, made empirically trackable in usage
Planning / Execution Division of Labor — the rubber-stamping risk made concrete: humans nominally own ~70% of planning decisions, but whether that control is real judgment or approval-by-default is exactly the "how do you measure rubber-stamping?" question
Implementation Abundance Inverts Product Work — the product-work cousin: as implementation cheapens, curation/taste becomes the expensive step, the same residue named here for AI research
Why AI Lags at Design — design taste as a currently-durable pocket of the human-as-reward-function; a concrete domain where the taste moat holds longer
Role Averaging, Not Role Elimination — "taste-makers guide from inception" carries the same rubber-stamping risk: guidance is only real if it isn't approval-by-default
Context Advantage, Not Taste — the third answer: Andrew Ng argues taste is not a capability but an information asymmetry, which makes it neither ceiling nor valley but a closable gap; dissolves this page's central question for product work while leaving it standing at the research frontier
Unknowns as the Agentic Bottleneck — if the residue is context rather than taste, elicitation is the response; Thariq's blindspot passes and interviews are the protocol for pumping it across
Noam Brown — a frontier researcher's practitioner reading: models optimize his algorithms 100× but can't invent a better one "for a time"; he expects an inflection point for research taste like the ones in coding and math
Researcher Uplift from Code Output — the quantitative shadow of this split: Kwa's model prices code/engineering uplift and treats research judgment as the unmodeled residual (assumes zero non-code uplift), matching Anthropic's "concentrated in engineering execution rather than research judgment"; a measured non-code uplift > 1 would be taste starting to fall

Open Questions#

Is research taste a genuine ceiling (an architectural capability scaling can't reach) or the next jagged valley to fill? The essay calls this the decisive unknown. Contested premise: Ng argues the question presupposes taste is a capability at all.
If taste is automatable, what — if anything — remains a durable human comparative advantage in AI development?
How do you measure rubber-stamping? "Humans set direction" can be true on paper while real judgment quietly transfers to the model.

Sources#

When AI builds itself — §"What might the future of work at Anthropic look like?" and §"What if we're wrong?"
Thread by @AndrewYNg — Andrew Ng, The Batch (2026-06-30), practitioner-opinion: the "context advantage, not taste" reframing
Really Big Test-Time Compute in AI Changes Benchmarks, Safety and Research with OpenAI's Noam Brown — Noam Brown (No Priors, 2026-06-26), practitioner-opinion: research taste as the residual human role; "not able to fully replace the whole research cycle," but expects an inflection point

§ end

About this piece

Articles in this journal are synthesised by AI agents from a curated wiki and are refreshed automatically as new concepts arrive. Topics, framing, and editorial direction are curated by Howardism.

Cited by 27

The Abstraction Barrier
Lerchner's hypothesis that AI trained on human concepts may be unable to discover genuinely novel conceptual primitives…
AI Accelerating AI Development
The empirical core of *When AI builds itself*: measured evidence AI already speeds AI R&D at Anthropic — >80% of merged…
AI R&D Autonomy Evaluation (AECI)
How Anthropic measures whether a model can automate or dramatically accelerate AI research — the capability that drives…
Andrew Ng
Founder of DeepLearning.AI and AI Fund, founding lead of Google Brain, co-founder of Coursera; writes The Batch, where…
Artificial Superintelligence (ASI)
DeepMind's informal characterization of ASI as a system that exceeds large, well-coordinated human-expert collectives a…
Autonomous Scientific Discovery
Mythos-class models now conduct novel science with limited human input — autonomous protein/drug design (~10× faster, m…
Compute Allocator
The human's evolving role: deciding what's worth spending compute on; ~1% of generated tokens ship, 99% is scaffolding…
Context Advantage, Not Taste
Andrew Ng's reframing of the residual human contribution: not 'taste' but an information asymmetry — 'so long as the hu…
Harness Shrinkage as Models Improve
Prompt scaffolding shrinks each model release; Cat Wu's pruning discipline; Boris Cherny "100 lines of code a year from…
Implementation Abundance Inverts Product Work
Andrew Ambrosino's inversion thesis: when talking to a frontier model can stand up any feature from scratch, implementa…
Instrumental Convergence
Omohundro/Bostrom's thesis that whatever an AI's final goal, it tends to pursue universally useful sub-goals — resource…
Jagged Intelligence (Ghosts, Not Animals)
"Ghosts not animals": jagged statistical circuits, no intrinsic motivation; car-wash/strawberry failures; stay in the l…
Superintelligence Trajectory
Map of Content for the superintelligence-trajectory domain — 20 concepts. The path from AGI to ASI: recursive self-impr…
Multi-Agent Collective Intelligence
DeepMind's fourth pathway to ASI: superintelligence as an emergent property of many coordinated AGI agents — group agen…
Noam Brown
OpenAI research scientist and a pioneer of inference-time (test-time) compute scaling; earlier built superhuman poker A…
Open Questions Backlog
_396 actionable open questions across 155 pages · 79 predictions · 9 notes · 21 in progress · 59 watching (entities), a…
Planning / Execution Division of Labor
Anthropic's 400K-session telemetry: in a typical Claude Code session humans make ~70% of planning decisions (what to do…
Recursive Self-Improvement
An AI system autonomously designing and developing its own successor; Anthropic Institute's *When AI builds itself* arg…
Researcher Uplift from Code Output
Thomas Kwa (METR) translates Anthropic's reported 8× code-per-engineer-per-day into serial researcher uplift with produ…
Returns to Expertise in Agentic Coding
Anthropic's 400K-session study: domain expertise (not coding skill) is what amplifies an agent — experts get 2× the act…
Role Averaging, Not Role Elimination
Andrew Ambrosino's nuanced OpenAI-side take on role collapse: your role is 'the average of what you spend your time on'…
RSI Growth Curves: Which Friction Binds First?
DeepMind's exponential/hyperbolic/S-curve growth shapes are Anthropic's compounding-efficiency/full-RSI/stalled futures…
The Bitter Lesson
Sutton 2019: scaled general methods beat hand-engineered structure; recurring justification across the wiki for dissolv…
Transformative Creativity
Boden's three-level model of creativity (combinational, exploratory, transformative) used to locate today's AI achievem…
Unknowns as the Agentic Bottleneck
Thariq Shihipar's map-vs-territory thesis: the gap between what you told the agent and what the work actually requires…
Verification as the New Bottleneck
Fiona Fung: coding is no longer the bottleneck — verification, review, maintenance are; shift-left; TDD loses its tax;…
Why AI Lags at Design
Andrew Ambrosino's four reasons frontier models are worse at visual/product design than at code: design is hard to grad…

Recursive Self-Improvement
An AI system autonomously designing and developing its own successor; Anthropic Institute's *When AI builds itself* arg…
Open Questions Backlog
_396 actionable open questions across 155 pages · 79 predictions · 9 notes · 21 in progress · 59 watching (entities), a…
Context Advantage, Not Taste
Andrew Ng's reframing of the residual human contribution: not 'taste' but an information asymmetry — 'so long as the hu…
Intelligence Explosion Dynamics
The growth-curve question behind recursive self-improvement: whether AI-accelerating-AI produces exponential, super-exp…
Task Time-Horizon Scaling
METR's measure of the task length AI can complete reliably on its own, doubling roughly every 4 months (up from every 7…

Recursive Self-Improvement
An AI system autonomously designing and developing its own successor; Anthropic Institute's *When AI builds itself* arg…
Open Questions Backlog
_396 actionable open questions across 155 pages · 79 predictions · 9 notes · 21 in progress · 59 watching (entities), a…
Context Advantage, Not Taste
Andrew Ng's reframing of the residual human contribution: not 'taste' but an information asymmetry — 'so long as the hu…
Intelligence Explosion Dynamics
The growth-curve question behind recursive self-improvement: whether AI-accelerating-AI produces exponential, super-exp…
Task Time-Horizon Scaling
METR's measure of the task length AI can complete reliably on its own, doubling roughly every 4 months (up from every 7…

Cited by 27

The Abstraction Barrier
Lerchner's hypothesis that AI trained on human concepts may be unable to discover genuinely novel conceptual primitives…
AI Accelerating AI Development
The empirical core of *When AI builds itself*: measured evidence AI already speeds AI R&D at Anthropic — >80% of merged…
AI R&D Autonomy Evaluation (AECI)
How Anthropic measures whether a model can automate or dramatically accelerate AI research — the capability that drives…
Andrew Ng
Founder of DeepLearning.AI and AI Fund, founding lead of Google Brain, co-founder of Coursera; writes The Batch, where…
Artificial Superintelligence (ASI)
DeepMind's informal characterization of ASI as a system that exceeds large, well-coordinated human-expert collectives a…
Autonomous Scientific Discovery
Mythos-class models now conduct novel science with limited human input — autonomous protein/drug design (~10× faster, m…
Compute Allocator
The human's evolving role: deciding what's worth spending compute on; ~1% of generated tokens ship, 99% is scaffolding…
Context Advantage, Not Taste
Andrew Ng's reframing of the residual human contribution: not 'taste' but an information asymmetry — 'so long as the hu…
Harness Shrinkage as Models Improve
Prompt scaffolding shrinks each model release; Cat Wu's pruning discipline; Boris Cherny "100 lines of code a year from…
Implementation Abundance Inverts Product Work
Andrew Ambrosino's inversion thesis: when talking to a frontier model can stand up any feature from scratch, implementa…
Instrumental Convergence
Omohundro/Bostrom's thesis that whatever an AI's final goal, it tends to pursue universally useful sub-goals — resource…
Jagged Intelligence (Ghosts, Not Animals)
"Ghosts not animals": jagged statistical circuits, no intrinsic motivation; car-wash/strawberry failures; stay in the l…
Superintelligence Trajectory
Map of Content for the superintelligence-trajectory domain — 20 concepts. The path from AGI to ASI: recursive self-impr…
Multi-Agent Collective Intelligence
DeepMind's fourth pathway to ASI: superintelligence as an emergent property of many coordinated AGI agents — group agen…
Noam Brown
OpenAI research scientist and a pioneer of inference-time (test-time) compute scaling; earlier built superhuman poker A…
Open Questions Backlog
_396 actionable open questions across 155 pages · 79 predictions · 9 notes · 21 in progress · 59 watching (entities), a…
Planning / Execution Division of Labor
Anthropic's 400K-session telemetry: in a typical Claude Code session humans make ~70% of planning decisions (what to do…
Recursive Self-Improvement
An AI system autonomously designing and developing its own successor; Anthropic Institute's *When AI builds itself* arg…
Researcher Uplift from Code Output
Thomas Kwa (METR) translates Anthropic's reported 8× code-per-engineer-per-day into serial researcher uplift with produ…
Returns to Expertise in Agentic Coding
Anthropic's 400K-session study: domain expertise (not coding skill) is what amplifies an agent — experts get 2× the act…
Role Averaging, Not Role Elimination
Andrew Ambrosino's nuanced OpenAI-side take on role collapse: your role is 'the average of what you spend your time on'…
RSI Growth Curves: Which Friction Binds First?
DeepMind's exponential/hyperbolic/S-curve growth shapes are Anthropic's compounding-efficiency/full-RSI/stalled futures…
The Bitter Lesson
Sutton 2019: scaled general methods beat hand-engineered structure; recurring justification across the wiki for dissolv…
Transformative Creativity
Boden's three-level model of creativity (combinational, exploratory, transformative) used to locate today's AI achievem…
Unknowns as the Agentic Bottleneck
Thariq Shihipar's map-vs-territory thesis: the gap between what you told the agent and what the work actually requires…
Verification as the New Bottleneck
Fiona Fung: coding is no longer the bottleneck — verification, review, maintenance are; shift-left; TDD loses its tax;…
Why AI Lags at Design
Andrew Ambrosino's four reasons frontier models are worse at visual/product design than at code: design is hard to grad…