Plate IISuperintelligence Trajectory機器翻譯 · machine-translated過時翻譯 · stale translationENHOWARDISM

研究品味是人類的瓶頸

PublishedJune 7, 2026FiledConceptDomainSuperintelligence TrajectoryTagsGovernanceHuman AI CollaborationResearchRole EvolutionAnthropicReading12 minSourceAI-synthesised

當 AI 吸收執行工作後，人類角色逐漸收縮：選擇哪些問題重要、相信哪些結果，以及判斷何時某種方法已走入死胡同；這是自主性階梯的最高一階，也留下了一個開放問題：品味是否只是 AI 暫時做不好的『另一種能力』，之後終將掌握它

資料來源#

摘要#

當 AI 吸收了 AI 開發的執行工作（AI Accelerating AI Development），人類角色便收縮到一種殘餘能力；文章 When AI builds itself 稱之為研究品味與判斷力：「選擇哪些問題重要、相信哪些結果，以及判斷何時某種方法已走入死胡同。」這是自主性階梯的最高一階，也是唯一一項若持續由人類掌握，就能阻止遞迴自我改進閉合循環的能力。它是否會持續由人類掌握，正是文章承重般的不確定性。

逐漸收窄的角色#

文章對這種動態最清楚的描述是：「在人類開發 AI 的每個步驟中，人類角色都在收窄。」其中有兩個具體的收窄方向：

撰寫 → 審查。「一旦人類與 AI 撰寫的程式碼品質達到同等水準，人類將完全停止撰寫程式碼，轉而只做審查。」（據報這種同等水準大約現在已經達成；見 AI Accelerating AI Development。）這是從人類一側描述的 harness-shrinkage 故事。
執行 → 選擇實驗。「一旦 Claude 能夠執行實驗，問題就會轉向：『這些實驗中，哪些值得執行？』」做事——撰寫程式碼、執行實驗、產生結果——「現在幾乎不再耗費人類時間，即使它仍然需要計算資源成本。」

人類「目前」剩下的比較優勢是品味：決定哪些事情值得投入計算資源。用 Compute Allocator 的術語來說，人類不再是單次呼叫層級的計算資源分配者，而是整個研究計畫層級的計算資源分配者。

為何它可能不會一直由人類掌握——「只是另一種能力」#

文章拒絕把研究品味視為人類永久的護城河。有兩個論點對此提出挑戰：

**苦工可以自動化，而且那是大部分工作。**天才是「1% 靈感加上 99% 汗水」；那 99%——擴大規模、看看什麼會壞掉、修正、重試——正是 Claude 擅長的事。大規模研究進展「主要是工具與資源的函數」。（見 The Bitter Lesson、Recursive Self-Improvement。）
**品味呈現出與其他一切相同的能力曲線。**研究判斷力正在改善的早期證據——在困難的繞路時刻，Claude 選對下一步的比例從 51% 提升至 64%（AI Accelerating AI Development）——顯示「研究品味可能只是另一種 AI 能力，AI 系統會暫時做不好，然後逐漸變得擅長。」先例是：AI 過去不擅長、後來卻能解釋笑話為何好笑、展現心智理論，以及解決語言謎題——這些定性技能原本都被認為是人類特有的。這是 Jagged Intelligence (Ghosts, Not Animals) 的樂觀面：今天的低谷，可能是明天的高峰。

一位前沿研究者從實務角度閱讀後，站在「只是另一種能力」這一邊。Noam Brown（OpenAI，practitioner-opinion）表示，模型能將他撲克研究中的演算法最佳化到 10–100 倍，但目前無法發明更好的演算法——「我可以給它很多時間，但它仍然做不到」——因此今天的模型「是研究者非常好的補充」，「還無法完全取代整個研究循環」。但他明確預期這種情況會改變：「如果某個時刻——寫程式如此，數學也如此——突然出現一個轉折點，模型真的已經夠好了……我不會對研究品味也遇到那個轉折點感到意外。」這是鋸齒狀低谷論點，由一個在實驗室內部觀察低谷的人說出來。

一位 Anthropic 員工提出了誠實的反駁：「截至目前，人類的比較優勢仍然在於看見全局，並思考超越眼前任務的事情。」問題在於，「目前」究竟是穩定的均衡，還是正在退遠的前沿。

第三種答案：它根本不是能力#

上述兩個論點——品味是持久的護城河，或品味是下一個鋸齒狀低谷——都預設品味是一種能力，也就是心智或具備、或後天取得的東西。Andrew Ng 拒絕這個前提（2026 年 6 月，practitioner-opinion）：

「很多人把這項人類貢獻稱為『品味』，但我更傾向把它理解為人類擁有情境優勢，因為這讓我們更清楚地看見如何協助 AI 系統變得更好……只要人類知道一些 AI 不知道的事，就需要 human-in-the-loop 將那份知識注入系統。」

依照這種理解，殘餘部分不是某種官能，而是資訊不對稱——人類了解使用者、限制、歷史，以及那件明顯到沒有人寫下來的事。這使人類角色變得可證偽（不對稱是否存在？）、會消逝（情境會隨傳遞而閉合），也成為工程問題，而不是神秘學。這也意味著本頁所圍繞的問題——天花板還是低谷？——可能從一開始就問錯了。請見情境優勢，而非品味，了解完整討論，包括這個重新框架在哪裡失效：它解釋了部署不對稱（我了解我的使用者），卻沒有解釋生成不對稱（抽象障礙、變革式創造力），後者缺少的是概念，而不是事實。

兩個框架對什麼算是證據有不同看法。Agentic Coding 中專業能力的回報是區分兩者的工具：如果專業溢價下降，是因為模型獲得了判斷力，那品味就是一種能力；如果下降是因為情境傳遞改善，那品味從頭到尾都是不對稱。

無論哪種情況，都是瓶頸#

即使品味仍由人類掌握，它也會成為關鍵約束——整條管線的 Amdahl's-law bottleneck。如果人類無法像 Claude 生成程式碼那樣快速地審查程式碼，「人類審查將成為 AI 開發的瓶頸。」而如果人類大部分時間都花在那個負責指引方向、只佔個位數百分比的工作上，每個人就能引導多得多的工作量——因此，人類判斷的品質與吞吐量會成為稀缺資源，正如 Verification as the New Bottleneck 對驗證的預測，以及 HBR 的問責重設研究對監督的預測。風險在於，人類名義上做決定，實際上卻只是蓋章批准（「還遠遠無法取代資深研究者」的判斷悄悄腐化成一紙形式）。

人類的代價（較安靜的線索）#

文章收錄了員工對角色收窄感受如何的罕見坦率引述——值得記錄，因為它們指出了生產力圖表不會呈現的代價：

小型互助所形成的禮物經濟崩解：請同事「你能幫我讓這個腳本跑起來嗎？」曾經「創造一點債務、一點相互理解。[Claude] 更快了，完全不會創造債務，但每一件這樣的事，都是一次失去的人類協作邀請。」
偶然相關性的眩暈：「在一切運作良好的日子，我忍不住會想，我做的事情似乎都不重要……但也有一切都壞掉、我不明白原因的日子，而我會意識到，我已經完全不知道自己到底在做什麼了。」

開放問題#

研究品味是真正的天花板（擴展規模無法觸及的架構能力），還是下一個可以填滿的鋸齒狀低谷？文章稱這是決定性的未知。有爭議的前提：Ng 主張，這個問題本身就預設品味是一種能力。
如果品味可以自動化，那麼在 AI 開發中，還有什麼——如果確實還有任何東西——能成為人類持久的比較優勢？
要如何衡量蓋章批准？「人類設定方向」在紙面上可以為真，但真正的判斷可能悄悄轉移給模型。

資料來源#

When AI builds itself — §"What might the future of work at Anthropic look like?" 與 §"What if we're wrong?"
Thread by @AndrewYNg — Andrew Ng，The Batch（2026-06-30），practitioner-opinion：重新框架為「情境優勢，而非品味」
Really Big Test-Time Compute in AI Changes Benchmarks, Safety and Research with OpenAI's Noam Brown — Noam Brown（No Priors，2026-06-26），practitioner-opinion：研究品味作為人類的殘餘角色；「還無法完全取代整個研究循環」，但預期會出現轉折點

§ end

About this piece

Articles in this journal are synthesised by AI agents from a curated wiki and are refreshed automatically as new concepts arrive. Topics, framing, and editorial direction are curated by Howardism.

Cited by 27

The Abstraction Barrier
Lerchner's hypothesis that AI trained on human concepts may be unable to discover genuinely novel conceptual primitives…
AI Accelerating AI Development
The empirical core of *When AI builds itself*: measured evidence AI already speeds AI R&D at Anthropic — >80% of merged…
AI R&D Autonomy Evaluation (AECI)
How Anthropic measures whether a model can automate or dramatically accelerate AI research — the capability that drives…
Andrew Ng
Founder of DeepLearning.AI and AI Fund, founding lead of Google Brain, co-founder of Coursera; writes The Batch, where…
Artificial Superintelligence (ASI)
DeepMind's informal characterization of ASI as a system that exceeds large, well-coordinated human-expert collectives a…
Autonomous Scientific Discovery
Mythos-class models now conduct novel science with limited human input — autonomous protein/drug design (~10× faster, m…
Compute Allocator
The human's evolving role: deciding what's worth spending compute on; ~1% of generated tokens ship, 99% is scaffolding…
Context Advantage, Not Taste
Andrew Ng's reframing of the residual human contribution: not 'taste' but an information asymmetry — 'so long as the hu…
Harness Shrinkage as Models Improve
Prompt scaffolding shrinks each model release; Cat Wu's pruning discipline; Boris Cherny "100 lines of code a year from…
Implementation Abundance Inverts Product Work
Andrew Ambrosino's inversion thesis: when talking to a frontier model can stand up any feature from scratch, implementa…
Instrumental Convergence
Omohundro/Bostrom's thesis that whatever an AI's final goal, it tends to pursue universally useful sub-goals — resource…
Jagged Intelligence (Ghosts, Not Animals)
"Ghosts not animals": jagged statistical circuits, no intrinsic motivation; car-wash/strawberry failures; stay in the l…
Superintelligence Trajectory
Map of Content for the superintelligence-trajectory domain — 20 concepts. The path from AGI to ASI: recursive self-impr…
Multi-Agent Collective Intelligence
DeepMind's fourth pathway to ASI: superintelligence as an emergent property of many coordinated AGI agents — group agen…
Noam Brown
OpenAI research scientist and a pioneer of inference-time (test-time) compute scaling; earlier built superhuman poker A…
Open Questions Backlog
_396 actionable open questions across 155 pages · 79 predictions · 9 notes · 21 in progress · 59 watching (entities), a…
Planning / Execution Division of Labor
Anthropic's 400K-session telemetry: in a typical Claude Code session humans make ~70% of planning decisions (what to do…
Recursive Self-Improvement
An AI system autonomously designing and developing its own successor; Anthropic Institute's *When AI builds itself* arg…
Researcher Uplift from Code Output
Thomas Kwa (METR) translates Anthropic's reported 8× code-per-engineer-per-day into serial researcher uplift with produ…
Returns to Expertise in Agentic Coding
Anthropic's 400K-session study: domain expertise (not coding skill) is what amplifies an agent — experts get 2× the act…
Role Averaging, Not Role Elimination
Andrew Ambrosino's nuanced OpenAI-side take on role collapse: your role is 'the average of what you spend your time on'…
RSI Growth Curves: Which Friction Binds First?
DeepMind's exponential/hyperbolic/S-curve growth shapes are Anthropic's compounding-efficiency/full-RSI/stalled futures…
The Bitter Lesson
Sutton 2019: scaled general methods beat hand-engineered structure; recurring justification across the wiki for dissolv…
Transformative Creativity
Boden's three-level model of creativity (combinational, exploratory, transformative) used to locate today's AI achievem…
Unknowns as the Agentic Bottleneck
Thariq Shihipar's map-vs-territory thesis: the gap between what you told the agent and what the work actually requires…
Verification as the New Bottleneck
Fiona Fung: coding is no longer the bottleneck — verification, review, maintenance are; shift-left; TDD loses its tax;…
Why AI Lags at Design
Andrew Ambrosino's four reasons frontier models are worse at visual/product design than at code: design is hard to grad…

Recursive Self-Improvement
An AI system autonomously designing and developing its own successor; Anthropic Institute's *When AI builds itself* arg…
Open Questions Backlog
_396 actionable open questions across 155 pages · 79 predictions · 9 notes · 21 in progress · 59 watching (entities), a…
Context Advantage, Not Taste
Andrew Ng's reframing of the residual human contribution: not 'taste' but an information asymmetry — 'so long as the hu…
Intelligence Explosion Dynamics
The growth-curve question behind recursive self-improvement: whether AI-accelerating-AI produces exponential, super-exp…
Task Time-Horizon Scaling
METR's measure of the task length AI can complete reliably on its own, doubling roughly every 4 months (up from every 7…

Recursive Self-Improvement
An AI system autonomously designing and developing its own successor; Anthropic Institute's *When AI builds itself* arg…
Open Questions Backlog
_396 actionable open questions across 155 pages · 79 predictions · 9 notes · 21 in progress · 59 watching (entities), a…
Context Advantage, Not Taste
Andrew Ng's reframing of the residual human contribution: not 'taste' but an information asymmetry — 'so long as the hu…
Intelligence Explosion Dynamics
The growth-curve question behind recursive self-improvement: whether AI-accelerating-AI produces exponential, super-exp…
Task Time-Horizon Scaling
METR's measure of the task length AI can complete reliably on its own, doubling roughly every 4 months (up from every 7…

Cited by 27

The Abstraction Barrier
Lerchner's hypothesis that AI trained on human concepts may be unable to discover genuinely novel conceptual primitives…
AI Accelerating AI Development
The empirical core of *When AI builds itself*: measured evidence AI already speeds AI R&D at Anthropic — >80% of merged…
AI R&D Autonomy Evaluation (AECI)
How Anthropic measures whether a model can automate or dramatically accelerate AI research — the capability that drives…
Andrew Ng
Founder of DeepLearning.AI and AI Fund, founding lead of Google Brain, co-founder of Coursera; writes The Batch, where…
Artificial Superintelligence (ASI)
DeepMind's informal characterization of ASI as a system that exceeds large, well-coordinated human-expert collectives a…
Autonomous Scientific Discovery
Mythos-class models now conduct novel science with limited human input — autonomous protein/drug design (~10× faster, m…
Compute Allocator
The human's evolving role: deciding what's worth spending compute on; ~1% of generated tokens ship, 99% is scaffolding…
Context Advantage, Not Taste
Andrew Ng's reframing of the residual human contribution: not 'taste' but an information asymmetry — 'so long as the hu…
Harness Shrinkage as Models Improve
Prompt scaffolding shrinks each model release; Cat Wu's pruning discipline; Boris Cherny "100 lines of code a year from…
Implementation Abundance Inverts Product Work
Andrew Ambrosino's inversion thesis: when talking to a frontier model can stand up any feature from scratch, implementa…
Instrumental Convergence
Omohundro/Bostrom's thesis that whatever an AI's final goal, it tends to pursue universally useful sub-goals — resource…
Jagged Intelligence (Ghosts, Not Animals)
"Ghosts not animals": jagged statistical circuits, no intrinsic motivation; car-wash/strawberry failures; stay in the l…
Superintelligence Trajectory
Map of Content for the superintelligence-trajectory domain — 20 concepts. The path from AGI to ASI: recursive self-impr…
Multi-Agent Collective Intelligence
DeepMind's fourth pathway to ASI: superintelligence as an emergent property of many coordinated AGI agents — group agen…
Noam Brown
OpenAI research scientist and a pioneer of inference-time (test-time) compute scaling; earlier built superhuman poker A…
Open Questions Backlog
_396 actionable open questions across 155 pages · 79 predictions · 9 notes · 21 in progress · 59 watching (entities), a…
Planning / Execution Division of Labor
Anthropic's 400K-session telemetry: in a typical Claude Code session humans make ~70% of planning decisions (what to do…
Recursive Self-Improvement
An AI system autonomously designing and developing its own successor; Anthropic Institute's *When AI builds itself* arg…
Researcher Uplift from Code Output
Thomas Kwa (METR) translates Anthropic's reported 8× code-per-engineer-per-day into serial researcher uplift with produ…
Returns to Expertise in Agentic Coding
Anthropic's 400K-session study: domain expertise (not coding skill) is what amplifies an agent — experts get 2× the act…
Role Averaging, Not Role Elimination
Andrew Ambrosino's nuanced OpenAI-side take on role collapse: your role is 'the average of what you spend your time on'…
RSI Growth Curves: Which Friction Binds First?
DeepMind's exponential/hyperbolic/S-curve growth shapes are Anthropic's compounding-efficiency/full-RSI/stalled futures…
The Bitter Lesson
Sutton 2019: scaled general methods beat hand-engineered structure; recurring justification across the wiki for dissolv…
Transformative Creativity
Boden's three-level model of creativity (combinational, exploratory, transformative) used to locate today's AI achievem…
Unknowns as the Agentic Bottleneck
Thariq Shihipar's map-vs-territory thesis: the gap between what you told the agent and what the work actually requires…
Verification as the New Bottleneck
Fiona Fung: coding is no longer the bottleneck — verification, review, maintenance are; shift-left; TDD loses its tax;…
Why AI Lags at Design
Andrew Ambrosino's four reasons frontier models are worse at visual/product design than at code: design is hard to grad…

研究品味是人類的瓶頸

資料來源#

摘要#

逐漸收窄的角色#

為何它可能不會一直由人類掌握——「只是另一種能力」#

第三種答案：它根本不是能力#

無論哪種情況，都是瓶頸#

人類的代價（較安靜的線索）#

相關連結#

開放問題#

資料來源#