Skip to content
H
Howardismvol. 03 · quiet corner of the web
PLATE II · PIECE № 03HOWARDISM

Interaction / Background Model Split

PublishedMay 13, 2026FiledConceptReading3 minSourceAI-synthesised

Dual-model architecture: time-aware interaction model stays present; async background model handles deep reasoning/tools; rich-context-package delegation; "reasoning-model planning at non-thinking latency"

Illustration for Interaction / Background Model Split

Sources#

Summary#

Interaction Models are architected as two cooperating models:

  • a time-aware interaction model that maintains real-time presence — perceiving and responding in a continuous loop (see Time-Aligned Micro-Turns);
  • an asynchronous background model that handles sustained reasoning, tool use, and longer-horizon work.

The payoff: the user gets both responsiveness and depth — "the planning, tool-use, and agentic workflows of reasoning models at the response latency of non-thinking ones."

How delegation works#

  • When a task needs deeper reasoning than can be produced instantly, the interaction model delegates to the background model, which runs asynchronously.
  • The handoff is a rich context package — not a standalone query, but the full conversation.
  • The interaction model stays present throughout — answering follow-ups, taking new input, holding the thread.
  • Results stream back as the background model produces them; the interaction model interleaves updates into the conversation at a moment appropriate to what the user is currently doing — not as an abrupt context switch.

Both halves are intelligent#

This isn't a "dumb frontend, smart backend" design. The interaction model on its own is "competitive on both interactive and intelligence benchmarks" — see Interactivity Benchmarks (e.g. TML-Interaction-Small beats every non-thinking baseline on Audio MultiChallenge APR even without the background agent; benchmarks marked * use the background agent for reasoning/tool tasks).

Relationship to other multi-model patterns#

This is the latency-vs-depth axis of multi-model orchestration, distinct from:

  • the role-based model selection in Client-Side Agent Optimization (assign cheap/expensive models per role in an agent graph) — there the split is cost-driven and static; here it's latency-driven and dynamic-per-turn;
  • the three-agent / reviewer-in-fresh-context pattern (Deep Modules for Agents, Agent Harness Engineering) — there the split is for context isolation; here it's for temporal concerns (stay responsive vs. think hard).

Open / acknowledged#

TML calls background agents "an essential capability" they've "just scratched the surface" on — both pushing background agentic intelligence to the frontier and exploring how background agents work together with the interaction model.

Connections#

Sources#

§ end
About this piece

Articles in this journal are synthesised by AI agents from a curated wiki and are refreshed automatically as new concepts arrive. Topics, framing, and editorial direction are curated by Howardism.

6 articles link here
  • ConceptEncoder-Free Early Fusion

    Multimodal design with minimal pre-processing instead of large standalone encoders: dMel audio embedding, 40×40-patch h…

  • ConceptFull-Duplex Interaction

    Perceive-and-respond simultaneously across modalities; proactive interjection, visual-cue reactions, simultaneous speec…

  • ConceptInteraction Models

    Thinking Machines Lab (May 2026): models that handle audio/video/text interaction natively in real time instead of via…

  • ConceptInteractivity Benchmarks

    FD-bench, Audio MultiChallenge + new TimeSpeak/CueSpeak (proactive audio) and RepCount-A/ProactiveVideoQA/Charades (vis…

  • ConceptTime-Aligned Micro-Turns

    The core interaction-model move: input/output as continuous streams in ~200ms interleaved chunks, no turn boundaries; s…

  • EntityTML-Interaction-Small

    TML's first interaction model: 276B MoE / 12B active, audio+video+text in / text+audio out, 200ms micro-turns, async ba…

Related articles
  • ConceptInteraction Models

    Thinking Machines Lab (May 2026): models that handle audio/video/text interaction natively in real time instead of via…

  • ConceptFull-Duplex Interaction

    Perceive-and-respond simultaneously across modalities; proactive interjection, visual-cue reactions, simultaneous speec…

  • ConceptContext Window Smart Zone

    Smart zone vs dumb zone (Dex Hardy / Matt Pocock): quadratic attention scaling, ~100K marker independent of advertised…

  • ConceptThe Bitter Lesson

    Sutton 2019: scaled general methods beat hand-engineered structure; recurring justification across the wiki for dissolv…

  • EntityClaude Opus 4.7

    GA frontier model from Anthropic; direct upgrade to 4.6 at same price; literal instruction following, 1.0–1.35× tokeniz…