Skip to content
H
Howardismvol. 03 · quiet corner of the web
PLATE II · PIECE № 02HOWARDISM

Full-Duplex Interaction

PublishedMay 13, 2026FiledConceptReading3 minSourceAI-synthesised

Perceive-and-respond simultaneously across modalities; proactive interjection, visual-cue reactions, simultaneous speech, live translation/commentary, time-aware speech — all special cases of model behavior

Illustration for Full-Duplex Interaction

Sources#

Summary#

"Full-duplex" = the model perceives and responds at the same time, in a constant two-way exchange — as opposed to half-duplex turn-taking (one party at a time). Interaction Models generalize the audio full-duplex idea across audio, video, and text. The phrase the post uses: an experience that "feels more like collaborating and less like prompting."

The interaction modes it enables#

All of these are special-purpose harnesses today; in an interaction model they're special cases of model behavior (see Time-Aligned Micro-Turns):

  • Proactive interjection — "interrupt when I say something wrong"; the model jumps in mid-turn when context warrants, not only at end-of-turn.
  • Visual-cue reactions — "tell me when I've written a bug in my code"; "count how many pushups I do"; requires acting on a visual change with no audio cue (audio-only turn-detection harnesses fail this — they say "Sure thing!" then go silent).
  • Simultaneous speech — user and model speak concurrently: "translate Spanish→English live."
  • Speak-while-watching — "live-commentate this sports game."
  • Time-aware speech — "remind me to breathe in and out every 4 seconds until I stop"; "how long did it take me to write this function?"
  • Codeswitch correction — "every time I use another language, give me the correct word in the original language" (requires speaking at the same time as the user).

The model implicitly tracks whether the speaker is thinking, yielding, self-correcting, or inviting a response — no separate dialog-management component.

Concurrent non-speech action#

While listening and speaking, the model can simultaneously call tools, search, browse, or generate UI — weaving results back into the conversation when appropriate. The deeper/longer of these are delegated to the background model.

Prior art it builds on#

Audio full-duplex models are the existing example of bidirectional/continuous interaction; robotics and autonomous vehicles are cited as domains where real-time perception+action is a given. Interaction models apply the principle across all modalities.

Connections#

Sources#

§ end
About this piece

Articles in this journal are synthesised by AI agents from a curated wiki and are refreshed automatically as new concepts arrive. Topics, framing, and editorial direction are curated by Howardism.

6 articles link here
  • ConceptEncoder-Free Early Fusion

    Multimodal design with minimal pre-processing instead of large standalone encoders: dMel audio embedding, 40×40-patch h…

  • ConceptInteraction Models

    Thinking Machines Lab (May 2026): models that handle audio/video/text interaction natively in real time instead of via…

  • ConceptInteractivity Benchmarks

    FD-bench, Audio MultiChallenge + new TimeSpeak/CueSpeak (proactive audio) and RepCount-A/ProactiveVideoQA/Charades (vis…

  • ConceptTime-Aligned Micro-Turns

    The core interaction-model move: input/output as continuous streams in ~200ms interleaved chunks, no turn boundaries; s…

  • EntityTML-Interaction-Small

    TML's first interaction model: 276B MoE / 12B active, audio+video+text in / text+audio out, 200ms micro-turns, async ba…

  • ConceptTurn-Based Interface Bottleneck

    Why current AI interfaces limit collaboration: single-thread turn-taking is a bandwidth bottleneck; humans pushed out b…

Related articles
  • ConceptInteraction Models

    Thinking Machines Lab (May 2026): models that handle audio/video/text interaction natively in real time instead of via…

  • ConceptTime-Aligned Micro-Turns

    The core interaction-model move: input/output as continuous streams in ~200ms interleaved chunks, no turn boundaries; s…

  • ConceptInteraction / Background Model Split

    Dual-model architecture: time-aware interaction model stays present; async background model handles deep reasoning/tool…

  • ConceptThe Bitter Lesson

    Sutton 2019: scaled general methods beat hand-engineered structure; recurring justification across the wiki for dissolv…

  • EntityThinking Machines Lab

    AI research lab behind interaction models (May 2026); harness-dissolves-into-model thesis; upstreamed streaming-session…