Sources#
Summary#
Interaction Models are architected as two cooperating models:
- a time-aware interaction model that maintains real-time presence — perceiving and responding in a continuous loop (see Time-Aligned Micro-Turns);
- an asynchronous background model that handles sustained reasoning, tool use, and longer-horizon work.
The payoff: the user gets both responsiveness and depth — "the planning, tool-use, and agentic workflows of reasoning models at the response latency of non-thinking ones."
How delegation works#
- When a task needs deeper reasoning than can be produced instantly, the interaction model delegates to the background model, which runs asynchronously.
- The handoff is a rich context package — not a standalone query, but the full conversation.
- The interaction model stays present throughout — answering follow-ups, taking new input, holding the thread.
- Results stream back as the background model produces them; the interaction model interleaves updates into the conversation at a moment appropriate to what the user is currently doing — not as an abrupt context switch.
Both halves are intelligent#
This isn't a "dumb frontend, smart backend" design. The interaction model on its own is "competitive on both interactive and intelligence benchmarks" — see Interactivity Benchmarks (e.g. TML-Interaction-Small beats every non-thinking baseline on Audio MultiChallenge APR even without the background agent; benchmarks marked * use the background agent for reasoning/tool tasks).
Relationship to other multi-model patterns#
This is the latency-vs-depth axis of multi-model orchestration, distinct from:
- the role-based model selection in Client-Side Agent Optimization (assign cheap/expensive models per role in an agent graph) — there the split is cost-driven and static; here it's latency-driven and dynamic-per-turn;
- the three-agent / reviewer-in-fresh-context pattern (Deep Modules for Agents, Agent Harness Engineering) — there the split is for context isolation; here it's for temporal concerns (stay responsive vs. think hard).
Open / acknowledged#
TML calls background agents "an essential capability" they've "just scratched the surface" on — both pushing background agentic intelligence to the frontier and exploring how background agents work together with the interaction model.
Connections#
- Interaction Models — parent concept
- Time-Aligned Micro-Turns — what keeps the interaction model present while the background model thinks
- Interactivity Benchmarks —
*-marked results use the background agent; shows the split's contribution - Client-Side Agent Optimization — a different axis of multi-model design (cost/role, not latency/depth)
- Deep Modules for Agents / Agent Harness Engineering — multi-agent splits for context isolation rather than latency
- Harness Shrinkage as Models Improve — open question whether the split is permanent or a transitional artifact until one model is both fast and deep enough
Sources#
6 articles link here
- ConceptEncoder-Free Early Fusion
Multimodal design with minimal pre-processing instead of large standalone encoders: dMel audio embedding, 40×40-patch h…
- ConceptFull-Duplex Interaction
Perceive-and-respond simultaneously across modalities; proactive interjection, visual-cue reactions, simultaneous speec…
- ConceptInteraction Models
Thinking Machines Lab (May 2026): models that handle audio/video/text interaction natively in real time instead of via…
- ConceptInteractivity Benchmarks
FD-bench, Audio MultiChallenge + new TimeSpeak/CueSpeak (proactive audio) and RepCount-A/ProactiveVideoQA/Charades (vis…
- ConceptTime-Aligned Micro-Turns
The core interaction-model move: input/output as continuous streams in ~200ms interleaved chunks, no turn boundaries; s…
- EntityTML-Interaction-Small
TML's first interaction model: 276B MoE / 12B active, audio+video+text in / text+audio out, 200ms micro-turns, async ba…
Related articles
- ConceptInteraction Models
Thinking Machines Lab (May 2026): models that handle audio/video/text interaction natively in real time instead of via…
- ConceptFull-Duplex Interaction
Perceive-and-respond simultaneously across modalities; proactive interjection, visual-cue reactions, simultaneous speec…
- ConceptContext Window Smart Zone
Smart zone vs dumb zone (Dex Hardy / Matt Pocock): quadratic attention scaling, ~100K marker independent of advertised…
- ConceptThe Bitter Lesson
Sutton 2019: scaled general methods beat hand-engineered structure; recurring justification across the wiki for dissolv…
- EntityClaude Opus 4.7
GA frontier model from Anthropic; direct upgrade to 4.6 at same price; literal instruction following, 1.0–1.35× tokeniz…
