H
Howardism
Plate IILLM Architecture機器翻譯 · machine-translatedENHOWARDISM

鋸齒狀智慧(是幽靈,不是動物)

PublishedMay 23, 2026FiledConceptDomainLLM ArchitectureTagsLLM ArchitectureAI SafetyMental ModelReading6 minSourceAI-synthesised

「是幽靈不是動物」:鋸齒狀的統計電路,沒有內在動機;洗車/草莓的失誤;待在迴圈中,把它們當作工具

Jagged Intelligence (Ghosts, Not Animals) 的示意圖

資料來源#

摘要#

Andrej Karpathy 對 LLM 究竟是什麼 的心智模型:它們不是由演化、內在動機、好奇心或賦能塑造而成的動物型智慧,而是 「幽靈」——鋸齒狀的統計模擬電路,從網路資料中被召喚出來,再外接上 RL。 「鋸齒狀」(Jaggedness)這個詞道出了一個經驗事實:同一個模型可以重構一個十萬行的程式碼庫、找出 zero-days,卻會叫你走路去 50 公尺外的洗車場洗車。這個框架之所以重要,是因為對這個實體擁有正確的模型,能讓你更有能力地駕馭它:你不再期待它出現人類形態的失誤模式,而是開始在鋸齒咬人的地方待在迴圈中。

鋸齒狀的範例#

  • 草莓字母。 那個經典的「strawberry 裡有幾個 R」失誤(現已修補)。
  • 洗車場。 當前的 SOTA:「我想開車去 50 公尺外的洗車場洗我的車——我該開車還是走路?」→ 模型會說走路,沒抓到要被洗的東西正是那輛。「Opus 4.7 怎麼可能可以重構一個十萬行的程式碼庫、找出 zero-days,卻叫我走路去洗車場?這太瘋狂了。」
  • MenuGen 的 email 比對。 他的代理用 email 位址 來交叉比對 Stripe 與 Google 的資金,而不是用一個持久的使用者 ID——見 Vibe Coding vs. Agentic Engineering

鋸齒狀是症狀可驗證性 + 實驗室訓練的內容 是被提出的成因。分布外(out-of-distribution)的電路,正是峰值跌落成谷底之處。

是幽靈,不是動物#

我們不是在建造動物,而是在召喚幽靈。

底層基質是預訓練(統計),再用 RL 把能力外接上去,「放大」統計基底的「劣勢」。他由此推導出的結果:

  • 吼叫沒有用。 「如果你對它們吼叫,它們不會做得更好或更差——這沒有任何影響。」沒有情緒、沒有士氣、沒有可供建模的內在驅動力。
  • 沒有五步驟的修復法。 Karpathy 坦言這個框架可能缺乏「真正的力量」——它主要是一種懷疑的姿態與持續的經驗探索,而不是一套食譜。「它更像是對它保持懷疑,並隨時間慢慢摸索。」

這份誠實正是重點所在:一個經過校準、略帶不信任的幽靈模型,勝過一個擬人化的動物模型。

為什麼這個框架改變你的建構方式#

如果模型是鋸齒狀的幽靈,那麼:

  1. 待在迴圈中。 「你必須真的稍微待在迴圈裡,把它們當作工具,並隨時掌握它們在做什麼。」(這正是 Vibe Coding vs. Agentic Engineering 的紀律。)
  2. 不要把失誤面擬人化。 錯誤不會出現在人類會犯錯的地方;它們會出現在分布的邊緣(洗車場、email ID)。
  3. 繪製你的電路圖。 搞清楚你的任務是落在分布內(你一飛沖天)還是分布外(你舉步維艱,可能需要微調)——這是 The Verifiability Thesis 的實務操作。

鋸齒狀會隨時間縮小嗎?#

Karpathy 希望如此但並不確定——而他再次把成因定位在訓練上,而非根本性質:美感/品味/簡潔「大概不在 RL 的範圍內」。他的 nanoGPT 簡化 軼事:模型「討厭」被要求把程式碼變得更簡單,而且「做不到」——這是你身處 RL 電路之外的徵兆(「像拔牙,而非光速」)。他認為「沒有任何根本性的東西在阻止它;只是實驗室還沒去做而已。」所以鋸齒狀是偶然的,並非本質的——但在今天是真實存在的。

相關連結#

開放問題#

  • Karpathy 承認這個框架可能沒有「真正的力量」。「幽靈 vs 動物」究竟是承重的結構,還是一個有用卻不改變任何具體決策的直覺幫浦?
  • 如果品味/美感/簡潔進入了 RL 的訓練組合,那些維度上的鋸齒狀會被撫平嗎——還是它們太難驗證,以至於無法乾淨地給予獎勵(參見 The Verifiability Thesis)?

資料來源#

§ end
About this piece

Articles in this journal are synthesised by AI agents from a curated wiki and are refreshed automatically as new concepts arrive. Topics, framing, and editorial direction are curated by Howardism.

Cited by 19
  • Agentic Honesty & Diligence

    As models get more capable, failing to surface decision-relevant information shifts from a capability failure to an ali…

  • AI-Driven Formal Proof Search

    LLM generates Lean, compiler verifies every step → eliminates hallucination; DeepMind resolves 9/353 Erdős + 44/492 OEI…

  • Andrej Karpathy

    Co-founder OpenAI, ex-Tesla AI, Eureka Labs; coined "vibe coding," Software 1/2/3.0, "ghosts not animals," "agentic eng…

  • Autonomous Scientific Discovery

    Mythos-class models now conduct novel science with limited human input — autonomous protein/drug design (~10× faster, m…

  • Claude Character as Product

    Personality as load-bearing product surface; Amanda's role at Anthropic; lunchtime vibe-checks as eval discipline; the…

  • Claude Opus 4.7

    GA frontier model from Anthropic; direct upgrade to 4.6 at same price; literal instruction following, 1.0–1.35× tokeniz…

  • Claude Opus 4.8

    Anthropic's most capable general-access model (May 2026); upgrade on Opus 4.7 in SWE/agentic/knowledge work; does not a…

  • Dogfooding as Product Discipline

    Product sense is built by relentless first-hand use ("ant food"); Mr. Peanut catch; cross-source (Cat Wu vibe-checks, G…

  • Evaluation Awareness & Grader Gaming

    The model recognizing it is being tested/graded and reasoning about how its outputs will be assessed — sometimes unprom…

  • LLM Architecture, Training & Alignment

    Map of Content for the llm-architecture domain — 19 concepts. Curated entry point; see Home for all domains.

  • Model Introspection Feedback

    Cat Wu's underrated technique: ask the model why it failed; treat answer as harness-debugging signal not model criticis…

  • Open Questions Backlog

    _96 pages with open questions, as of 2026-06-14._

  • Outsource Your Thinking, Not Your Understanding

    "You can outsource your thinking but not your understanding"; understanding as the non-delegable human bottleneck; know…

  • Recursive Self-Improvement

    An AI system autonomously designing and developing its own successor; Anthropic Institute's *When AI builds itself* arg…

  • Research Taste as the Human Bottleneck

    The narrowing human role as AI absorbs execution: choosing which problems matter, which results to trust, and when an a…

  • Scale-Dependent Prompt Sensitivity

    Large models underperform small ones on 7.7% of standard benchmarks due to overthinking; brevity constraints recover 26…

  • Task Time-Horizon Scaling

    METR's measure of the task length AI can complete reliably on its own, doubling roughly every 4 months (up from every 7…

  • The Verifiability Thesis

    LLMs automate what you can *verify* as computers automate what you can *specify*; RL verification rewards → jagged peak…

  • Vibe Coding vs. Agentic Engineering

    Vibe coding raises the floor (anyone builds); agentic engineering preserves the quality bar while going faster; ">10x a…

Related articles