Skip to content
H
Howardismvol. 03 · quiet corner of the web
PLATE II · PIECE № 09HOWARDISM

TML-Interaction-Small

PublishedMay 13, 2026FiledEntityReading3 minSourceAI-synthesised

TML's first interaction model: 276B MoE / 12B active, audio+video+text in / text+audio out, 200ms micro-turns, async background agent; best turn-taking latency of any model; research preview May 2026

Illustration for TML-Interaction-Small

Sources#

What it is#

Thinking Machines Lab's first interaction model — released as a research preview, May 2026. Pitched as "the first model that has both strong intelligence/instruction following and interactivity."

  • Architecture: 276B-parameter MoE, 12B active. Trained from scratch as an interaction model (not a turn-based model with interactivity bolted on).
  • Modalities: continuous audio + video + text in; text + audio out. Encoder-Free Early Fusion (dMel audio embedding, 40×40-patch hMLP for frames, flow head for audio out), single shared transformer, all components co-trained from scratch.
  • Interaction mechanism: Time-Aligned Micro-Turns — 200ms interleaved input/output chunks, no turn boundaries.
  • Reasoning: delegates deep reasoning / tool use / long-horizon work to an async background model — see Interaction / Background Model Split. Competitive on intelligence benchmarks even without the background agent.

Headline numbers (May 2026)#

  • Turn-taking latency: 0.40s (FD-bench v1, audio) — best of all models compared.
  • FD-bench v1.5 average: 77.8 vs ~39–54 for baselines including thinking-high models.
  • FD-bench v3 (audio+tools): 82.8% response quality / 68.0% Pass@1 (with background agent).
  • Audio MultiChallenge APR: 43.4% — beats every non-thinking baseline; only GPT-realtime-2.0 xhigh (48.5%) higher.
  • Baselines compared: GPT-realtime-2.0 (minimal/xhigh), GPT-realtime-1.5, Gemini-3.1-flash-live-preview (minimal/high), Qwen 3.5 Omni-plus-realtime. Full table in Interactivity Benchmarks.

Limitations (acknowledged)#

  • Long continuous A/V sessions accumulate context fast — careful context management still an open problem (echoes Context Window Smart Zone).
  • Needs reliable low-latency connectivity; degrades badly without it.
  • "Small" because larger pretrained models are currently too slow to serve in this regime — larger models promised later in 2026.

Availability#

Limited research preview "in the coming months," wider release "later this year." Feedback solicited at interaction@thinkingmachines.ai; research grants open.

Connections#

Sources#

§ end
About this piece

Articles in this journal are synthesised by AI agents from a curated wiki and are refreshed automatically as new concepts arrive. Topics, framing, and editorial direction are curated by Howardism.

4 articles link here
  • EntityClaude Opus 4.7

    GA frontier model from Anthropic; direct upgrade to 4.6 at same price; literal instruction following, 1.0–1.35× tokeniz…

  • ConceptInteraction Models

    Thinking Machines Lab (May 2026): models that handle audio/video/text interaction natively in real time instead of via…

  • ConceptInteractivity Benchmarks

    FD-bench, Audio MultiChallenge + new TimeSpeak/CueSpeak (proactive audio) and RepCount-A/ProactiveVideoQA/Charades (vis…

  • EntityThinking Machines Lab

    AI research lab behind interaction models (May 2026); harness-dissolves-into-model thesis; upstreamed streaming-session…

Related articles
  • ConceptInteraction Models

    Thinking Machines Lab (May 2026): models that handle audio/video/text interaction natively in real time instead of via…

  • ConceptInteractivity Benchmarks

    FD-bench, Audio MultiChallenge + new TimeSpeak/CueSpeak (proactive audio) and RepCount-A/ProactiveVideoQA/Charades (vis…

  • ConceptFull-Duplex Interaction

    Perceive-and-respond simultaneously across modalities; proactive interjection, visual-cue reactions, simultaneous speec…

  • ConceptTime-Aligned Micro-Turns

    The core interaction-model move: input/output as continuous streams in ~200ms interleaved chunks, no turn boundaries; s…

  • ConceptTurn-Based Interface Bottleneck

    Why current AI interfaces limit collaboration: single-thread turn-taking is a bandwidth bottleneck; humans pushed out b…