H
Howardism
Plate IILLM Architecture機器翻譯 · machine-translatedENHOWARDISM

The Bitter Lesson

PublishedMay 13, 2026FiledConceptDomainLLM ArchitectureTagsLLM ArchitecturePrincipleReading5 minSourceAI-synthesised

Sutton 2019:擴展後的通用方法勝過手工打造的結構;在這個 wiki 中反覆出現、用來證成「把 harness 溶入模型」的論據;但書——機械式驗證與性格也許無法往模型內部遷移

The Bitter Lesson 文章插圖

資料來源#

摘要#

Rich Sutton 在 2019 年的文章:善用運算(搜尋、學習)的通用方法,最終會勝過那些把人類知識與手工打造結構內建進去的方法——而且隨著運算量成長,差距會拉得很大。「苦澀」之處在於:這一再讓那些投入於精巧領域結構的研究者感到意外,因為那些結構成了天花板,而非地基。

這個頁面之所以存在,是因為這條原則在整個 wiki 中反覆作為承重的論據出現——被明確援引,用來證成把 harness 溶入模型的做法。

它在這裡被援引之處#

  • Interaction Models——TML 直接引用「the bitter lesson」:手工打造的互動系統(VAD、turn-detection、dialog-management harness)「將被通用能力的進展所超越」,因此「若要讓互動性隨智能一起擴展,它必須成為模型本身的一部分」。見 Turn-Based Interface Bottleneck
  • Encoder-Free Early Fusion——在單一 transformer 中從零開始共同訓練所有模態元件,而不是把預訓練好的編碼器/解碼器拼接起來:手工打造的模組邊界更少。
  • Time-Aligned Micro-Turns——移除人為的回合邊界,讓各種互動模式成為可擴展的模型行為,而不是每種模式各寫一套 harness 程式碼。
  • Harness Shrinkage as Models Improve——把同樣的邏輯套用到 coding-agent 的 harness:prompt 鷹架彌補模型尚未能做到的部分,並應隨著模型進步而縮減。(那裡的但書:機械式驗證——測試、型別、linter——才是不會往內遷移的部分。)
  • Agent Harness Engineering——「強制不變式,而非實作」:讓模型自己找出路徑;harness 只編碼那些必須為真的東西。

標準的但書#

The bitter lesson 講的是能力與結構往模型內部遷移,而不是「harness 沒有用」。有些東西理所當然地留在模型之外:機械式驗證(Harness Shrinkage as Models Improve 的綜論)、組織專屬的政策/風格、安全邊界,以及——根據 Claude Character as Product——刻意打造的性格/人格工作。對每個 harness 元件而言,懸而未決的問題都是:它落在那條線的哪一側。

相關連結#

  • Evolutionary Proof Search——那套量身打造的演化裝置,正是 the bitter lesson 預測會被吸收掉的東西
  • Interaction Models——近期最明確的一次援引
  • Turn-Based Interface Bottleneck——「較不聰明的 harness 終究輸給規模擴展」
  • Harness Shrinkage as Models Improve——coding-agent 版本,附帶機械式驗證的但書
  • Agent Harness Engineering——以「不變式而非實作」作為一條意識到 bitter lesson 的設計準則
  • Encoder-Free Early Fusion / Time-Aligned Micro-Turns——以它為依據的架構選擇
  • Claude Character as Product——一個可能的反例:性格也許不會往內遷移
  • Model Spec Midtraining (MSM)——把對齊從 harness 的 prompt 注入轉移到模型內化的價值觀,是在對齊這條軸線上的一次 bitter-lesson 式行動
  • Compute Allocator——點明哪些東西留在那條線的人類這一側:分配決策、以及支撐它的、面向人類的鷹架不會往內遷移,即使面向模型的結構會
  • HTML as the New Markdown——「為模型保留讓你驚喜的空間」是這條教訓在 prompt 層次的形式;但書是:面向人類的可讀性(HTML artifact)落在不會溶入模型的那一側
  • MCP and Computer Use——Boris Cherny 的「對模型而言,這些都只是 token」把底層基質的選擇(MCP/API/computer use)變成模型的決定,而非 harness 的決定;是工具調度的 bitter-lesson 終點
  • Agentic Loops Overtake Bespoke Systems——語料庫中最清楚的經驗性印證:隨著 LLM 進步,DeepMind 那套簡單的 agentic loop,在開放的數學問題上追平了它量身打造的訓練系統(AlphaProof +演化式搜尋)
  • AI R&D Autonomy Evaluation (AECI)——如果 the bitter lesson 一路走到底,擴展後的通用方法最終會改進它們自己;AECI 就是 Anthropic 用來衡量那道門檻是否將近的方式
  • Recursive Self-Improvement——這條原則最遠的外推:「研究進展主要是工具與資源的函數」,因此努力(那 99%)變得可自動化
  • AI Accelerating AI Development——經驗性案例:kernel 最佳化迴圈從 3×→52× 的進步,就是擴展後的通用方法勝過手動調校,而且是實際量測到的
  • Research Taste as the Human Bottleneck——對最後一塊頑抗陣地下的未定賭注:研究品味究竟是真正的天花板,還是 the bitter lesson 即將溶解的下一個結構?
  • Build for the Next Model——產品策略上的推論:既然能力會隨著版本一路往內遷移,那就先把「差一點就能用的東西」做成原型,讓下一代模型去溶解那道落差,而不是繞著它做工程
  • Task Time-Horizon Scaling——不斷上升的通用基準能力,正是那條持續削弱手工鷹架優勢的曲線
  • The Verifiability Thesis——Karpathy為何擴展後的 RL 會勝過手工工程的說法:各實驗室把運算砸在可驗證獎勵的環境上
  • Software 3.0——把神經網路當成宿主行程的外推,是 the bitter lesson 一路推到硬體層的結果
  • Andrej Karpathy——這條原則的常客(可驗證性、ghosts、Software 3.0 全都以它為基礎)

資料來源#

§ end
About this piece

Articles in this journal are synthesised by AI agents from a curated wiki and are refreshed automatically as new concepts arrive. Topics, framing, and editorial direction are curated by Howardism.

Cited by 27
  • Agent Harness Engineering

    Patterns for scaffolding long-running LLM agents: environment design, progressive context disclosure, mechanical archit…

  • Agentic Loops Overtake Bespoke Systems

    DeepMind's *basic* Ralph-loop agent matched its bespoke evolutionary+AlphaProof system as the LLM improved; the bitter…

  • AI Accelerating AI Development

    The empirical core of *When AI builds itself*: measured evidence AI already speeds AI R&D at Anthropic — >80% of merged…

  • AI R&D Autonomy Evaluation (AECI)

    How Anthropic measures whether a model can automate or dramatically accelerate AI research — the capability that drives…

  • Opinions on Using AI Tools & the Future of the Software Engineering Role

    Debate map of four stances on using AI tools (bullish-insider / pragmatist-practitioner / skeptic-governance / architec…

  • Andrej Karpathy

    Co-founder OpenAI, ex-Tesla AI, Eureka Labs; coined "vibe coding," Software 1/2/3.0, "ghosts not animals," "agentic eng…

  • Build for the Next Model

    Prototype the thing that almost works, not the thing that already works: bet that the next concrete model release (not…

  • Claude Character as Product

    Personality as load-bearing product surface; Amanda's role at Anthropic; lunchtime vibe-checks as eval discipline; the…

  • Compute Allocator

    The human's evolving role: deciding what's worth spending compute on; ~1% of generated tokens ship, 99% is scaffolding…

  • Encoder-Free Early Fusion

    Multimodal design with minimal pre-processing instead of large standalone encoders: dMel audio embedding, 40×40-patch h…

  • Evolutionary Proof Search

    The full-featured agent's mechanism: population DB of proof sketches, Elo via Plackett–Luce/Gibbs, P-UCB selection, LLM…

  • The Future of Agent Interfaces

    Interface future is layered: native interaction models for human collaboration, MCP/APIs for structured action, app pro…

  • Harness Shrinkage as Models Improve

    Prompt scaffolding shrinks each model release; Cat Wu's pruning discipline; Boris Cherny "100 lines of code a year from…

  • HTML as the New Markdown

    Thariq Shihipar's thesis: as models improve, thousand-line markdown plans overwhelm the *human*; HTML artifacts (visual…

  • Does the Human-Facing Harness (HTML Artifacts) Hit Its Own Bloat Ceiling?

    Yes — HTML raises and reshapes the human-attention ceiling but can't remove it; bloat relocates from document-length to…

  • Interaction Models

    Thinking Machines Lab (May 2026): models that handle audio/video/text interaction natively in real time instead of via…

  • MCP and Computer Use

    Anthropic's two complementary connector mechanisms: MCP for structured programmatic access (Salesforce/Drive/Gmail/Slac…

  • LLM Architecture, Training & Alignment

    Map of Content for the llm-architecture domain — 19 concepts. Curated entry point; see Home for all domains.

  • Model Spec Midtraining (MSM)

    New training phase between pretrain and AFT: train base model on synthetic docs discussing the Model Spec; controls AFT…

  • Recursive Self-Improvement

    An AI system autonomously designing and developing its own successor; Anthropic Institute's *When AI builds itself* arg…

  • Research Taste as the Human Bottleneck

    The narrowing human role as AI absorbs execution: choosing which problems matter, which results to trust, and when an a…

  • Software 3.0

    Karpathy's taxonomy: 1.0 code, 2.0 weights, 3.0 prompting; LLM as programmable interpreter; MenuGen "shouldn't exist";…

  • Task Time-Horizon Scaling

    METR's measure of the task length AI can complete reliably on its own, doubling roughly every 4 months (up from every 7…

  • Thinking Machines Lab

    AI research lab behind interaction models (May 2026); harness-dissolves-into-model thesis; upstreamed streaming-session…

  • Time-Aligned Micro-Turns

    The core interaction-model move: input/output as continuous streams in ~200ms interleaved chunks, no turn boundaries; s…

  • Turn-Based Interface Bottleneck

    Why current AI interfaces limit collaboration: single-thread turn-taking is a bandwidth bottleneck; humans pushed out b…

  • The Verifiability Thesis

    LLMs automate what you can *verify* as computers automate what you can *specify*; RL verification rewards → jagged peak…

Related articles
  • Harness Shrinkage as Models Improve

    Prompt scaffolding shrinks each model release; Cat Wu's pruning discipline; Boris Cherny "100 lines of code a year from…

  • Interaction Models

    Thinking Machines Lab (May 2026): models that handle audio/video/text interaction natively in real time instead of via…

  • Open Questions Backlog

    _96 pages with open questions, as of 2026-06-14._

  • Claude Code

    Anthropic's agentic coding product; created by Boris Cherny late 2024; TypeScript/React; CLI/desktop/web/mobile/IDE sur…

  • Recursive Self-Improvement

    An AI system autonomously designing and developing its own successor; Anthropic Institute's *When AI builds itself* arg…