Plate IILLM Architecture機器翻譯 · machine-translatedENHOWARDISM

The Bitter Lesson

PublishedMay 13, 2026FiledConceptDomainLLM ArchitectureTagsLLM ArchitecturePrincipleReading5 minSourceAI-synthesised

Sutton 2019：擴展後的通用方法勝過手工打造的結構；在這個 wiki 中反覆出現、用來證成「把 harness 溶入模型」的論據；但書——機械式驗證與性格也許無法往模型內部遷移

資料來源#

Interaction Models: A Scalable Approach to Human-AI Collaboration

摘要#

Rich Sutton 在 2019 年的文章：善用運算（搜尋、學習）的通用方法，最終會勝過那些把人類知識與手工打造結構內建進去的方法——而且隨著運算量成長，差距會拉得很大。「苦澀」之處在於：這一再讓那些投入於精巧領域結構的研究者感到意外，因為那些結構成了天花板，而非地基。

這個頁面之所以存在，是因為這條原則在整個 wiki 中反覆作為承重的論據出現——被明確援引，用來證成把 harness 溶入模型的做法。

它在這裡被援引之處#

Interaction Models——TML 直接引用「the bitter lesson」：手工打造的互動系統（VAD、turn-detection、dialog-management harness）「將被通用能力的進展所超越」，因此「若要讓互動性隨智能一起擴展，它必須成為模型本身的一部分」。見 Turn-Based Interface Bottleneck。
Encoder-Free Early Fusion——在單一 transformer 中從零開始共同訓練所有模態元件，而不是把預訓練好的編碼器／解碼器拼接起來：手工打造的模組邊界更少。
Time-Aligned Micro-Turns——移除人為的回合邊界，讓各種互動模式成為可擴展的模型行為，而不是每種模式各寫一套 harness 程式碼。
Harness Shrinkage as Models Improve——把同樣的邏輯套用到 coding-agent 的 harness：prompt 鷹架彌補模型尚未能做到的部分，並應隨著模型進步而縮減。（那裡的但書：機械式驗證——測試、型別、linter——才是不會往內遷移的部分。）
Agent Harness Engineering——「強制不變式，而非實作」：讓模型自己找出路徑；harness 只編碼那些必須為真的東西。

標準的但書#

The bitter lesson 講的是能力與結構往模型內部遷移，而不是「harness 沒有用」。有些東西理所當然地留在模型之外：機械式驗證（Harness Shrinkage as Models Improve 的綜論）、組織專屬的政策／風格、安全邊界，以及——根據 Claude Character as Product——刻意打造的性格／人格工作。對每個 harness 元件而言，懸而未決的問題都是：它落在那條線的哪一側。

資料來源#

Interaction Models: A Scalable Approach to Human-AI Collaboration（明確引用了「the bitter lesson」）

§ end

About this piece

Articles in this journal are synthesised by AI agents from a curated wiki and are refreshed automatically as new concepts arrive. Topics, framing, and editorial direction are curated by Howardism.

Cited by 27

Agent Harness Engineering
Patterns for scaffolding long-running LLM agents: environment design, progressive context disclosure, mechanical archit…
Agentic Loops Overtake Bespoke Systems
DeepMind's *basic* Ralph-loop agent matched its bespoke evolutionary+AlphaProof system as the LLM improved; the bitter…
AI Accelerating AI Development
The empirical core of *When AI builds itself*: measured evidence AI already speeds AI R&D at Anthropic — >80% of merged…
AI R&D Autonomy Evaluation (AECI)
How Anthropic measures whether a model can automate or dramatically accelerate AI research — the capability that drives…
Opinions on Using AI Tools & the Future of the Software Engineering Role
Debate map of four stances on using AI tools (bullish-insider / pragmatist-practitioner / skeptic-governance / architec…
Andrej Karpathy
Co-founder OpenAI, ex-Tesla AI, Eureka Labs; coined "vibe coding," Software 1/2/3.0, "ghosts not animals," "agentic eng…
Build for the Next Model
Prototype the thing that almost works, not the thing that already works: bet that the next concrete model release (not…
Claude Character as Product
Personality as load-bearing product surface; Amanda's role at Anthropic; lunchtime vibe-checks as eval discipline; the…
Compute Allocator
The human's evolving role: deciding what's worth spending compute on; ~1% of generated tokens ship, 99% is scaffolding…
Encoder-Free Early Fusion
Multimodal design with minimal pre-processing instead of large standalone encoders: dMel audio embedding, 40×40-patch h…
Evolutionary Proof Search
The full-featured agent's mechanism: population DB of proof sketches, Elo via Plackett–Luce/Gibbs, P-UCB selection, LLM…
The Future of Agent Interfaces
Interface future is layered: native interaction models for human collaboration, MCP/APIs for structured action, app pro…
Harness Shrinkage as Models Improve
Prompt scaffolding shrinks each model release; Cat Wu's pruning discipline; Boris Cherny "100 lines of code a year from…
HTML as the New Markdown
Thariq Shihipar's thesis: as models improve, thousand-line markdown plans overwhelm the *human*; HTML artifacts (visual…
Does the Human-Facing Harness (HTML Artifacts) Hit Its Own Bloat Ceiling?
Yes — HTML raises and reshapes the human-attention ceiling but can't remove it; bloat relocates from document-length to…
Interaction Models
Thinking Machines Lab (May 2026): models that handle audio/video/text interaction natively in real time instead of via…
MCP and Computer Use
Anthropic's two complementary connector mechanisms: MCP for structured programmatic access (Salesforce/Drive/Gmail/Slac…
LLM Architecture, Training & Alignment
Map of Content for the llm-architecture domain — 19 concepts. Curated entry point; see Home for all domains.
Model Spec Midtraining (MSM)
New training phase between pretrain and AFT: train base model on synthetic docs discussing the Model Spec; controls AFT…
Recursive Self-Improvement
An AI system autonomously designing and developing its own successor; Anthropic Institute's *When AI builds itself* arg…
Research Taste as the Human Bottleneck
The narrowing human role as AI absorbs execution: choosing which problems matter, which results to trust, and when an a…
Software 3.0
Karpathy's taxonomy: 1.0 code, 2.0 weights, 3.0 prompting; LLM as programmable interpreter; MenuGen "shouldn't exist";…
Task Time-Horizon Scaling
METR's measure of the task length AI can complete reliably on its own, doubling roughly every 4 months (up from every 7…
Thinking Machines Lab
AI research lab behind interaction models (May 2026); harness-dissolves-into-model thesis; upstreamed streaming-session…
Time-Aligned Micro-Turns
The core interaction-model move: input/output as continuous streams in ~200ms interleaved chunks, no turn boundaries; s…
Turn-Based Interface Bottleneck
Why current AI interfaces limit collaboration: single-thread turn-taking is a bandwidth bottleneck; humans pushed out b…
The Verifiability Thesis
LLMs automate what you can *verify* as computers automate what you can *specify*; RL verification rewards → jagged peak…

Harness Shrinkage as Models Improve
Prompt scaffolding shrinks each model release; Cat Wu's pruning discipline; Boris Cherny "100 lines of code a year from…
Interaction Models
Thinking Machines Lab (May 2026): models that handle audio/video/text interaction natively in real time instead of via…
Open Questions Backlog
_96 pages with open questions, as of 2026-06-14._
Claude Code
Anthropic's agentic coding product; created by Boris Cherny late 2024; TypeScript/React; CLI/desktop/web/mobile/IDE sur…
Recursive Self-Improvement
An AI system autonomously designing and developing its own successor; Anthropic Institute's *When AI builds itself* arg…

Harness Shrinkage as Models Improve
Prompt scaffolding shrinks each model release; Cat Wu's pruning discipline; Boris Cherny "100 lines of code a year from…
Interaction Models
Thinking Machines Lab (May 2026): models that handle audio/video/text interaction natively in real time instead of via…
Open Questions Backlog
_96 pages with open questions, as of 2026-06-14._
Claude Code
Anthropic's agentic coding product; created by Boris Cherny late 2024; TypeScript/React; CLI/desktop/web/mobile/IDE sur…
Recursive Self-Improvement
An AI system autonomously designing and developing its own successor; Anthropic Institute's *When AI builds itself* arg…

Cited by 27

Agent Harness Engineering
Patterns for scaffolding long-running LLM agents: environment design, progressive context disclosure, mechanical archit…
Agentic Loops Overtake Bespoke Systems
DeepMind's *basic* Ralph-loop agent matched its bespoke evolutionary+AlphaProof system as the LLM improved; the bitter…
AI Accelerating AI Development
The empirical core of *When AI builds itself*: measured evidence AI already speeds AI R&D at Anthropic — >80% of merged…
AI R&D Autonomy Evaluation (AECI)
How Anthropic measures whether a model can automate or dramatically accelerate AI research — the capability that drives…
Opinions on Using AI Tools & the Future of the Software Engineering Role
Debate map of four stances on using AI tools (bullish-insider / pragmatist-practitioner / skeptic-governance / architec…
Andrej Karpathy
Co-founder OpenAI, ex-Tesla AI, Eureka Labs; coined "vibe coding," Software 1/2/3.0, "ghosts not animals," "agentic eng…
Build for the Next Model
Prototype the thing that almost works, not the thing that already works: bet that the next concrete model release (not…
Claude Character as Product
Personality as load-bearing product surface; Amanda's role at Anthropic; lunchtime vibe-checks as eval discipline; the…
Compute Allocator
The human's evolving role: deciding what's worth spending compute on; ~1% of generated tokens ship, 99% is scaffolding…
Encoder-Free Early Fusion
Multimodal design with minimal pre-processing instead of large standalone encoders: dMel audio embedding, 40×40-patch h…
Evolutionary Proof Search
The full-featured agent's mechanism: population DB of proof sketches, Elo via Plackett–Luce/Gibbs, P-UCB selection, LLM…
The Future of Agent Interfaces
Interface future is layered: native interaction models for human collaboration, MCP/APIs for structured action, app pro…
Harness Shrinkage as Models Improve
Prompt scaffolding shrinks each model release; Cat Wu's pruning discipline; Boris Cherny "100 lines of code a year from…
HTML as the New Markdown
Thariq Shihipar's thesis: as models improve, thousand-line markdown plans overwhelm the *human*; HTML artifacts (visual…
Does the Human-Facing Harness (HTML Artifacts) Hit Its Own Bloat Ceiling?
Yes — HTML raises and reshapes the human-attention ceiling but can't remove it; bloat relocates from document-length to…
Interaction Models
Thinking Machines Lab (May 2026): models that handle audio/video/text interaction natively in real time instead of via…
MCP and Computer Use
Anthropic's two complementary connector mechanisms: MCP for structured programmatic access (Salesforce/Drive/Gmail/Slac…
LLM Architecture, Training & Alignment
Map of Content for the llm-architecture domain — 19 concepts. Curated entry point; see Home for all domains.
Model Spec Midtraining (MSM)
New training phase between pretrain and AFT: train base model on synthetic docs discussing the Model Spec; controls AFT…
Recursive Self-Improvement
An AI system autonomously designing and developing its own successor; Anthropic Institute's *When AI builds itself* arg…
Research Taste as the Human Bottleneck
The narrowing human role as AI absorbs execution: choosing which problems matter, which results to trust, and when an a…
Software 3.0
Karpathy's taxonomy: 1.0 code, 2.0 weights, 3.0 prompting; LLM as programmable interpreter; MenuGen "shouldn't exist";…
Task Time-Horizon Scaling
METR's measure of the task length AI can complete reliably on its own, doubling roughly every 4 months (up from every 7…
Thinking Machines Lab
AI research lab behind interaction models (May 2026); harness-dissolves-into-model thesis; upstreamed streaming-session…
Time-Aligned Micro-Turns
The core interaction-model move: input/output as continuous streams in ~200ms interleaved chunks, no turn boundaries; s…
Turn-Based Interface Bottleneck
Why current AI interfaces limit collaboration: single-thread turn-taking is a bandwidth bottleneck; humans pushed out b…
The Verifiability Thesis
LLMs automate what you can *verify* as computers automate what you can *specify*; RL verification rewards → jagged peak…

The Bitter Lesson

資料來源#

摘要#

它在這裡被援引之處#

標準的但書#

相關連結#

資料來源#