資料來源#
摘要#
Thinking Machines Lab 對於當前 AI 介面為何限制協作的框架論述:回合制介面是人類與模型之間的頻寬瓶頸。這正是 Interaction Models 旨在消融的問題。
雙重論點#
-
AI 實驗室過度優化自主性。 實驗室將自主能力視為模型最重要的特性;因此,當今的模型與介面「並未針對人類持續參與迴路進行優化」。但在大多數實際工作中,使用者無法事先完整指定需求然後離開——好的結果來自於反覆澄清與回饋的協作迴路。
-
人類被介面排擠,而非被工作本身排擠。「人類越來越被排擠,不是因為工作不需要他們,而是因為介面沒有容納他們的空間。」解決方法是讓人們與 AI 協作的方式,如同他們與其他人協作一樣:傳訊息、說話、聆聽、觀看、展示、插話——模型也做同樣的事。
機制:單一執行緒#
當今的模型「以單一執行緒體驗現實」:
- 在使用者完成輸入/說話之前,模型處於等待狀態,完全無法感知使用者正在做什麼或如何做。
- 在模型完成生成之前,其感知是凍結的——直到完成或被中斷,都不會有新資訊進入。
這個狹窄的通道限制了人的知識、意圖和判斷能傳達給模型的量,也限制了模型的工作對人類的可讀性。類比:「試圖透過電子郵件而非面對面來解決一個關鍵分歧。」
為何外掛機制無法解決問題#
現有的即時系統透過外掛機制來附加互動性——VAD(語音活動偵測)、輪次邊界預測、對話狀態機——這些元件「智慧程度明顯低於模型本身」。這個外掛機制排除了整類互動模式:
- 主動插話(「當我說錯時打斷我」)
- 對視覺線索的反應(「當我在程式碼中寫了 bug 時告訴我」)
- 邊聽邊說(「即時將西班牙語翻譯成英語」)
- 邊看邊說(「即時轉播這場體育賽事」)
The Bitter Lesson 指出這些手工打造的系統會被通用能力的成長所超越 → 解決之道是讓互動性成為模型原生能力(見 Time-Aligned Micro-Turns)。
相關連結#
- Interaction Models — 提出的解決方案
- The Bitter Lesson — 為何基於外掛機制的現狀會落敗
- Time-Aligned Micro-Turns — 移除輪次邊界的架構手段
- Full-Duplex Interaction — 瓶頸目前阻擋的互動模式
- Harness Shrinkage as Models Improve — 「智慧程度較低的外掛機制應消融進模型」的通用版本
- AI Employee Framing / Human-AI Accountability Redesign — 組織面的鏡像:兩者都警告不應將自主性視為目標並將人類推向邊緣;本頁是同一批判的介面面版本
- Design Concept Grilling — 主張價值在於協作式迭代;本頁主張介面才是阻礙它的因素
- Context Window Smart Zone — 正交的限制,同樣使「完全自主、放手離開」變得脆弱
資料來源#
Cited by 11
- AI Employee Framing
Kropp et al. (HBR May 2026, n=1,261): framing AI agents as "employees" vs "tools" cuts personal accountability −9pp, in…
- Opinions on Using AI Tools & the Future of the Software Engineering Role
Debate map of four stances on using AI tools (bullish-insider / pragmatist-practitioner / skeptic-governance / architec…
- Design Concept Grilling
Matt Pocock's `grill-me` skill; reach Brooks "design concept" before any plan; counter to specs-to-code; PRD as destina…
- Full-Duplex Interaction
Perceive-and-respond simultaneously across modalities; proactive interjection, visual-cue reactions, simultaneous speec…
- The Future of Agent Interfaces
Interface future is layered: native interaction models for human collaboration, MCP/APIs for structured action, app pro…
- Human-AI Accountability Redesign
HBR five-pillar prescription: span-of-control redesign, role redesign, performance management reset, decision-rights/es…
- Interaction Models
Thinking Machines Lab (May 2026): models that handle audio/video/text interaction natively in real time instead of via…
- Interaction & Multimodal
Map of Content for the interaction-multimodal domain — 7 concepts. Curated entry point; see Home for all domains.
- The Bitter Lesson
Sutton 2019: scaled general methods beat hand-engineered structure; recurring justification across the wiki for dissolv…
- Thinking Machines Lab
AI research lab behind interaction models (May 2026); harness-dissolves-into-model thesis; upstreamed streaming-session…
- Time-Aligned Micro-Turns
The core interaction-model move: input/output as continuous streams in ~200ms interleaved chunks, no turn boundaries; s…
Related articles
- Interaction Models
Thinking Machines Lab (May 2026): models that handle audio/video/text interaction natively in real time instead of via…
- Agent Harness Engineering
Patterns for scaffolding long-running LLM agents: environment design, progressive context disclosure, mechanical archit…
- Harness Shrinkage as Models Improve
Prompt scaffolding shrinks each model release; Cat Wu's pruning discipline; Boris Cherny "100 lines of code a year from…
- Opinions on Using AI Tools & the Future of the Software Engineering Role
Debate map of four stances on using AI tools (bullish-insider / pragmatist-practitioner / skeptic-governance / architec…
- Encoder-Free Early Fusion
Multimodal design with minimal pre-processing instead of large standalone encoders: dMel audio embedding, 40×40-patch h…
