Sources#
What it is#
An AI research lab (publishes under "Thinking Machines Lab: Connectionism"). In this wiki it first appears as the org behind Interaction Models — a May 2026 research preview reframing real-time human-AI collaboration as a model-native capability rather than a harness concern.
What they've shipped / argued (as seen here)#
- Interaction Models (May 2026 research preview) — models that natively take in audio/video/text and think/respond/act in real time. First model: TML-Interaction-Small (276B MoE, 12B active).
- Position: interactivity should scale with intelligence → it must be in the model, citing The Bitter Lesson against harness-based real-time systems (VAD, turn-detection).
- Engineering footprint: upstreamed a streaming-sessions feature to SGLang; published work on defeating nondeterminism in LLM inference (batch-invariant kernels), referenced for trainer-sampler alignment; prior post On-Policy Distillation.
- Running a research grant for interactivity / human-AI-collaboration benchmarks (details TBA); limited research preview of the interaction model "in the coming months," wider release "later this year"; larger models promised later in 2026.
How it connects#
- Stakes out a different priority than the labs critiqued in Turn-Based Interface Bottleneck ("AI labs over-optimize for autonomy") — implicitly positioning against the autonomy-first framing seen around Anthropic's and OpenAI's agent products (Claude Code, Symphony).
- Their harness-dissolves-into-model stance is the same shape as Harness Shrinkage as Models Improve (an Anthropic/Claude Code observation) — convergent thinking from different labs.
- Benchmarks their model against GPT-realtime-2.0 / 1.5 (OpenAI) and Gemini-3.1-flash-live (Google) and Qwen 3.5 Omni — see Interactivity Benchmarks.
Connections#
- Interaction Models — their headline research preview
- TML-Interaction-Small — the model
- The Bitter Lesson — the principle they invoke
- Turn-Based Interface Bottleneck — their critique of the status quo
- Interactivity Benchmarks — where they benchmark against OpenAI / Google / Alibaba models
- Harness Shrinkage as Models Improve — convergent thesis from Anthropic
- Anthropic — peer lab; different priority ordering (autonomy-first vs. interaction-first)
Sources#
8 articles link here
- ConceptAgent Harness Engineering
Patterns for scaffolding long-running LLM agents: environment design, progressive context disclosure, mechanical archit…
- EssayOpinions on Using AI Tools & the Future of the Software Engineering Role
Debate map of four stances on using AI tools (bullish-insider / pragmatist-practitioner / skeptic-governance / architec…
- EntityAnthropic
AI safety company / vendor of Claude; mission-as-tiebreaker culture; ~30–40 PMs across teams; Mike Krieger leads Labs r…
- ConceptHarness Shrinkage as Models Improve
Prompt scaffolding shrinks each model release; Cat Wu's pruning discipline; Boris Cherny "100 lines of code a year from…
- ConceptInteraction Models
Thinking Machines Lab (May 2026): models that handle audio/video/text interaction natively in real time instead of via…
- ConceptInteractivity Benchmarks
FD-bench, Audio MultiChallenge + new TimeSpeak/CueSpeak (proactive audio) and RepCount-A/ProactiveVideoQA/Charades (vis…
- EntityTML-Interaction-Small
TML's first interaction model: 276B MoE / 12B active, audio+video+text in / text+audio out, 200ms micro-turns, async ba…
- ConceptTurn-Based Interface Bottleneck
Why current AI interfaces limit collaboration: single-thread turn-taking is a bandwidth bottleneck; humans pushed out b…
Related articles
- ConceptInteraction Models
Thinking Machines Lab (May 2026): models that handle audio/video/text interaction natively in real time instead of via…
- EntityClaude Opus 4.7
GA frontier model from Anthropic; direct upgrade to 4.6 at same price; literal instruction following, 1.0–1.35× tokeniz…
- ConceptContext Window Smart Zone
Smart zone vs dumb zone (Dex Hardy / Matt Pocock): quadratic attention scaling, ~100K marker independent of advertised…
- ConceptHarness Shrinkage as Models Improve
Prompt scaffolding shrinks each model release; Cat Wu's pruning discipline; Boris Cherny "100 lines of code a year from…
- ConceptThe Bitter Lesson
Sutton 2019: scaled general methods beat hand-engineered structure; recurring justification across the wiki for dissolv…
