Howardism

Sources#

Interaction Models: A Scalable Approach to Human-AI Collaboration

What it is#

Thinking Machines Lab's first interaction model — released as a research preview, May 2026. Pitched as "the first model that has both strong intelligence/instruction following and interactivity."

Architecture: 276B-parameter MoE, 12B active. Trained from scratch as an interaction model (not a turn-based model with interactivity bolted on).
Modalities: continuous audio + video + text in; text + audio out. Encoder-Free Early Fusion (dMel audio embedding, 40×40-patch hMLP for frames, flow head for audio out), single shared transformer, all components co-trained from scratch.
Interaction mechanism: Time-Aligned Micro-Turns — 200ms interleaved input/output chunks, no turn boundaries.
Reasoning: delegates deep reasoning / tool use / long-horizon work to an async background model — see Interaction / Background Model Split. Competitive on intelligence benchmarks even without the background agent.

Headline numbers (May 2026)#

Turn-taking latency: 0.40s (FD-bench v1, audio) — best of all models compared.
FD-bench v1.5 average: 77.8 vs ~39–54 for baselines including thinking-high models.
FD-bench v3 (audio+tools): 82.8% response quality / 68.0% Pass@1 (with background agent).
Audio MultiChallenge APR: 43.4% — beats every non-thinking baseline; only GPT-realtime-2.0 xhigh (48.5%) higher.
Baselines compared: GPT-realtime-2.0 (minimal/xhigh), GPT-realtime-1.5, Gemini-3.1-flash-live-preview (minimal/high), Qwen 3.5 Omni-plus-realtime. Full table in Interactivity Benchmarks.

Limitations (acknowledged)#

Long continuous A/V sessions accumulate context fast — careful context management still an open problem (echoes Context Window Smart Zone).
Needs reliable low-latency connectivity; degrades badly without it.
"Small" because larger pretrained models are currently too slow to serve in this regime — larger models promised later in 2026.

Availability#

Limited research preview "in the coming months," wider release "later this year." Feedback solicited at interaction@thinkingmachines.ai; research grants open.

Connections#

Interaction Models — the model class
Thinking Machines Lab — who built it
Time-Aligned Micro-Turns / Encoder-Free Early Fusion / Interaction / Background Model Split — its three architectural pillars
Full-Duplex Interaction — the interaction modes it demonstrates
Interactivity Benchmarks — its full benchmark table and the baselines it beats
Claude Opus 4.7 — both era-mates (mid-2026 frontier); 4.7's xhigh effort tier mirrors GPT-realtime's minimal/xhigh, used here as a baseline config
Context Window Smart Zone — the long-session limitation

Sources#

Interaction Models: A Scalable Approach to Human-AI Collaboration

TML-Interaction-Small

Sources#

What it is#

Headline numbers (May 2026)#

Limitations (acknowledged)#

Availability#

Connections#

Sources#