Plate IIAgent Systems機器翻譯 · machine-translated過時翻譯 · stale translationENHOWARDISM

為下一個模型而打造

PublishedJune 7, 2026FiledConceptDomainAgent SystemsTagsAI Coding WorkflowProduct StrategyModel ImprovementReading10 minSourceAI-synthesised

製作那個幾乎能運作的東西，而不是已經能運作的東西：押注下一個具體的模型版本（而不是遙遠未來的 AGI）會修正工程無法解決的部分；Claude Design 的 Opus 4.7 回報，以及 OpenAI 的「二月推出的 Codex 應用程式如果十一月就推出會失敗」，是最清楚的案例——相同的產品形態、不同智慧程度的版本、不同結果

資料來源#

摘要#

這是 Harness Shrinkage as Models Improve 的產品策略推論，如今由三位 Anthropic 成員各自獨立表述：不要打造已經能運作的東西——製作幾乎能運作的東西，並押注下一個模型版本會補上差距。 Dan Carey 提供了最清楚的案例：Claude Design 上線時，團隊有一份問題清單，而這些問題「不是靠巧妙的工程解決……我們是靠 Opus 4.7 推出後解決的」。Boris Cherny 打造 Claude Code 時知道「它六個月內不會有 PMF，因為我們是在為下一個模型打造」。Cat Wu 將這種紀律表述為：「打造還不一定能運作的產品，這樣你才知道缺少什麼……然後有了最新模型，就可以直接替換進去」。由於模型快速進步，把工程力氣花在逼今日的模型做到下一季模型免費就能做到的事，都是浪費——「模型版本發布就像一股能托起所有船的潮汐」。

Carey 的表述（以及為何它最清楚）#

「你不會想要處理已經能運作的東西。你往往會想製作幾乎能運作的東西……下一個模型可能會直接修正那些無法靠工程解決的問題。我們在 Claude Design 上就遇到過這種情況……我們靠 Opus 4.7 推出後解決了它們。」

這是對這項押注罕見而具體的回顧式確認：一個具名產品（Claude Design）、一個具名模型（Claude Opus 4.7），以及一個明確結果（未解決的原型缺口由版本發布補上，而不是靠工程解決）。Boris 和 Cat 是從前瞻角度陳述策略；Carey 則展示了它如何奏效。

關鍵校準：是下一個模型，不是稻草人 AGI#

這項押注很容易被誤讀為「為某種想像中的超級 AI 打造」。Cat Wu 正是防範這種誤讀——她在自己的實體頁面上記錄的立場是「為當前模型打造」：「為超級 AGI 強模型打造產品非常容易。困難的是弄清楚，對當前模型而言，如何引出最大的能力？」這兩者可以整合成一條規則：

不要只為今日的模型打造 → 你會做得不夠，下一個版本一到，交付的東西就過時。
不要為遙遠未來的 AGI 稻草人打造 → 你會做過頭，交付依賴於不存在能力的空中樓閣。
為下一個具體版本（約為距今 ~6 個月的模型）打造 → 製作「幾乎能運作的東西」，把它作為研究預覽推出，讓下一個版本——一個你可以合理預測的版本——補上差距。

Carey 點出了原型所追求的目標：不是完整性，而是「那一絲魔法……某種未來可能變得完整的東西」。

OpenAI 方面的確認：相同形態、不同智慧（Ambrosino）#

Andrew Ambrosino 提供了第二個具體的回顧式案例——也是對這項押注最精準的表述。他對 Codex 應用程式的說法是：

「我非常確定，我們二月發布的 Codex 應用程式，如果十一月就準備好，絕對會在市場上失敗——唯一的差別就是十一月和二月之間的模型。完全相同的形態……它的結果只因幾個月的時機不同，就截然不同。」

他將其概括為**「相同功能、不同智慧、重新發布」模式**：Operator（在 ChatGPT 中）→ Atlas 中的代理模式 → Codex 裡的應用程式內瀏覽器，都是「本質上相同的功能」，而且「你可能需要把這個東西發布六次，才會真正運作——形態可能完全不變」。改變的是底層模型。因此他教導團隊不要固執——「不，這行不通，所以這是個糟糕的功能」是錯誤的解讀；「它可能只是還沒準備好」（可運作的版本是用來對照未來模型測試的產物，而不是可發布的成品）。

他警告的過度押注——「對當下而言太 AGI-pilled」。 Ambrosino 用一個異常坦率的跨供應商案例，指出校準另一側的失敗模式（下一節會警告的 AGI 稻草人）。最初的 Codex 網頁版本「給模型一項任務，它就離開去完成」——一種完全委派、AGI 形狀的產品形式——但「模型沒有把任務做得很好」。與此同時，Claude Code 問世時「完全在本機運作，沒有接上雲端……沒有假裝自己有那麼 AGI-pilled——它會問你問題，你不能直接把人生委派給它」，而且它「運作得好得多，因為那就是當時模型的能力所在」。他的教訓是：「我們對當下而言太 AGI-pilled 了。」這項押注必須根據模型實際所在的位置校準；讓互動形態符合當前能力，可能勝過押注模型尚未支援的委派。（這正是 Interaction Models 從互動設計角度描述的產品適配缺口。）

在模型不確定性下規劃（推論）#

同一套邏輯也會重塑路線圖。Ambrosino 說：「事情越接近短期，就越需要細節」——但九個月的計畫「必須非常模糊，因為你加入的任何精確度都是虛假的精確，而你只會浪費時間」。十一月規劃的任何事情「可能在十二月時仍然正確，但並不是實際發生的事」。因此，規劃變成在時間軸上預測模型能力：在他上一家公司，流程變成「列出所有我們感興趣的事、全部製作原型、決定哪些現在已經準備好、讓其他的放著醞釀，而每次模型有新的飛躍，就把那件事重新試一次並替換模型——因為功能是否好，取決於模型是否夠聰明，而不是它們的形態」。這就是由能力不確定性特別驅動的 planning-minimization。

為何這是苦澀教訓的推論#

這是 The Bitter Lesson 與 Harness Shrinkage as Models Improve 在產品側的表現：能力會在版本發布間遷移進入模型，因此為補償當前限制而建造的腳手架，是會折舊的資產。如果缺口屬於可隨規模擴展而消失的類型（推理、遵循指令、多模態保真度），用工程修補它，就是在打造一根很快便會刪除的拐杖。這項紀律是辨認哪些缺口屬於「等待模型」的缺口，哪些則是持久的 harness 工作（Harness Shrinkage as Models Improve 的但書：機械式驗證、安全性、品牌／角色不會向內遷移）。

必須維持的張力#

「製作幾乎能運作的東西」與 Problem-Solution Fit Discipline 的原型即證據陷阱直接衝突：快速原型證明的是打造可行，而不是問題真實存在。兩者的調和方式是：為下一個模型打造，關注的是能力風險（技術能到達那裡嗎？——可以，等它），而不是市場風險（有人想要這個嗎？——原型無法回答）。你仍然要透過使用者驗證需求；只是不要耗費工程力氣，去逼出下一個模型會交給你的能力。Carey 自己的安全機制，是讓這項押注建立在 Compounding Loop Optimization 與每日使用者接觸之上——即使特定能力缺口留給模型補上，產品形態仍持續獲得驗證。

開放問題#

在下一個版本發布前，如何區分「等待模型」的缺口與持久的 harness 缺口？判斷錯誤，你不是會交付空中樓閣，就是會打造一根終將刪除的拐杖。
這項押注仰賴可靠的發布節奏與可預測的能力曲線（Task Time-Horizon Scaling）。如果模型進步停滯（停滯但擴散的未來），「為下一個模型打造」會發生什麼事？
這項策略能否普遍適用於前沿實驗室以外的地方？前沿實驗室能優先看見下一個模型；外部團隊押注的是自己看不見的版本。

資料來源#

Designing with Claude: From prompt to production — Carey：「我們靠 Opus 4.7 推出後解決了它們」
Anthropic's Boris Cherny: Why Coding Is Solved, and What Comes Next — Boris：「為下一個模型打造」
How Anthropic's product team moves faster than anyone else | Cat Wu (Head of Product, Claude Code) — Cat：「打造還不一定能運作的產品」
OpenAI Codex lead on the new shape of product work — Ambrosino：「二月的應用程式如果十一月推出會失敗——唯一的差別就是模型」；「對當下而言太 AGI-pilled」

§ end

About this piece

Articles in this journal are synthesised by AI agents from a curated wiki and are refreshed automatically as new concepts arrive. Topics, framing, and editorial direction are curated by Howardism.

Cited by 19

Andrew Ambrosino
Product & engineering lead for the Codex desktop app at OpenAI; a designer→engineer→PM→founder generalist whose June 20…
Anthropic Labs
Anthropic's internal incubator — a 'bet factory' of ~a dozen tiny teams exploring the model frontier with lean-startup…
Claude Design
Anthropic Labs product (research preview, ~April 2026) for collaborating with Claude on polished visual artifacts — des…
Claude Opus 4.7
GA frontier model from Anthropic; direct upgrade to 4.6 at same price; literal instruction following, 1.0–1.35× tokeniz…
Codex
OpenAI's agentic coding and work platform: a CLI (April 2025) plus a desktop app (built Nov 2025, released Feb 2026) bu…
Compounding Loop Optimization
Dan Carey's discipline of instrumenting and automating every recurring step of the build loop — because when internal t…
Dan Carey
Product Manager leading product within Anthropic Labs; led Claude Design; 'Designing with Claude' talk (May 2026); ~two…
Harness Shrinkage as Models Improve
Prompt scaffolding shrinks each model release; Cat Wu's pruning discipline; Boris Cherny "100 lines of code a year from…
Interaction Models
Thinking Machines Lab (May 2026): models that handle audio/video/text interaction natively in real time instead of via…
Latent Capability Overhang
Noam Brown's claim that already-released models can do far more than anyone has extracted, because nobody spends enough…
Agent Systems & Harness Engineering
Map of Content for the agent-systems domain — 23 concepts. Harness engineering, agent loops and orchestration, context…
Open Questions Backlog
_396 actionable open questions across 155 pages · 79 predictions · 9 notes · 21 in progress · 59 watching (entities), a…
Polish No Longer Signals Readiness
Andrew Ambrosino's observation that the medium used to encode process-stage — a production-looking artifact meant late-…
The PRD-Replacement Spectrum at AI-Native Speed
Four positions (grill-then-PRD → lighter-PRD → build-to-decide → prototype-is-spec) are one spectrum once you decompose…
Problem-Solution Fit Discipline
Idea-stage thesis: three defenses against premature building (time, resources, belief friction) all eroded; AI as devil…
Prototype Over PRD
Dan Carey's prototype-replaces-PRD method: record a why-not-what conversation, transcribe it, hand the transcript to Cl…
Task Time-Horizon Scaling
METR's measure of the task length AI can complete reliably on its own, doubling roughly every 4 months (up from every 7…
The Bitter Lesson
Sutton 2019: scaled general methods beat hand-engineered structure; recurring justification across the wiki for dissolv…
Why AI Lags at Design
Andrew Ambrosino's four reasons frontier models are worse at visual/product design than at code: design is hard to grad…

Harness Shrinkage as Models Improve
Prompt scaffolding shrinks each model release; Cat Wu's pruning discipline; Boris Cherny "100 lines of code a year from…
Claude Code
Anthropic's agentic coding product; created by Boris Cherny late 2024; TypeScript/React; CLI/desktop/web/mobile/IDE sur…
Claude Design
Anthropic Labs product (research preview, ~April 2026) for collaborating with Claude on polished visual artifacts — des…
Compounding Loop Optimization
Dan Carey's discipline of instrumenting and automating every recurring step of the build loop — because when internal t…
Open Questions Backlog
_396 actionable open questions across 155 pages · 79 predictions · 9 notes · 21 in progress · 59 watching (entities), a…

Harness Shrinkage as Models Improve
Prompt scaffolding shrinks each model release; Cat Wu's pruning discipline; Boris Cherny "100 lines of code a year from…
Claude Code
Anthropic's agentic coding product; created by Boris Cherny late 2024; TypeScript/React; CLI/desktop/web/mobile/IDE sur…
Claude Design
Anthropic Labs product (research preview, ~April 2026) for collaborating with Claude on polished visual artifacts — des…
Compounding Loop Optimization
Dan Carey's discipline of instrumenting and automating every recurring step of the build loop — because when internal t…
Open Questions Backlog
_396 actionable open questions across 155 pages · 79 predictions · 9 notes · 21 in progress · 59 watching (entities), a…

Cited by 19

Andrew Ambrosino
Product & engineering lead for the Codex desktop app at OpenAI; a designer→engineer→PM→founder generalist whose June 20…
Anthropic Labs
Anthropic's internal incubator — a 'bet factory' of ~a dozen tiny teams exploring the model frontier with lean-startup…
Claude Design
Anthropic Labs product (research preview, ~April 2026) for collaborating with Claude on polished visual artifacts — des…
Claude Opus 4.7
GA frontier model from Anthropic; direct upgrade to 4.6 at same price; literal instruction following, 1.0–1.35× tokeniz…
Codex
OpenAI's agentic coding and work platform: a CLI (April 2025) plus a desktop app (built Nov 2025, released Feb 2026) bu…
Compounding Loop Optimization
Dan Carey's discipline of instrumenting and automating every recurring step of the build loop — because when internal t…
Dan Carey
Product Manager leading product within Anthropic Labs; led Claude Design; 'Designing with Claude' talk (May 2026); ~two…
Harness Shrinkage as Models Improve
Prompt scaffolding shrinks each model release; Cat Wu's pruning discipline; Boris Cherny "100 lines of code a year from…
Interaction Models
Thinking Machines Lab (May 2026): models that handle audio/video/text interaction natively in real time instead of via…
Latent Capability Overhang
Noam Brown's claim that already-released models can do far more than anyone has extracted, because nobody spends enough…
Agent Systems & Harness Engineering
Map of Content for the agent-systems domain — 23 concepts. Harness engineering, agent loops and orchestration, context…
Open Questions Backlog
_396 actionable open questions across 155 pages · 79 predictions · 9 notes · 21 in progress · 59 watching (entities), a…
Polish No Longer Signals Readiness
Andrew Ambrosino's observation that the medium used to encode process-stage — a production-looking artifact meant late-…
The PRD-Replacement Spectrum at AI-Native Speed
Four positions (grill-then-PRD → lighter-PRD → build-to-decide → prototype-is-spec) are one spectrum once you decompose…
Problem-Solution Fit Discipline
Idea-stage thesis: three defenses against premature building (time, resources, belief friction) all eroded; AI as devil…
Prototype Over PRD
Dan Carey's prototype-replaces-PRD method: record a why-not-what conversation, transcribe it, hand the transcript to Cl…
Task Time-Horizon Scaling
METR's measure of the task length AI can complete reliably on its own, doubling roughly every 4 months (up from every 7…
The Bitter Lesson
Sutton 2019: scaled general methods beat hand-engineered structure; recurring justification across the wiki for dissolv…
Why AI Lags at Design
Andrew Ambrosino's four reasons frontier models are worse at visual/product design than at code: design is hard to grad…

為下一個模型而打造

資料來源#

摘要#

Carey 的表述（以及為何它最清楚）#

關鍵校準：是下一個模型，不是稻草人 AGI#

OpenAI 方面的確認：相同形態、不同智慧（Ambrosino）#

在模型不確定性下規劃（推論）#

為何這是苦澀教訓的推論#

必須維持的張力#

相關連結#

開放問題#

資料來源#