Howardism · Vol. 03Plate II · No. 02
LLM Architecture, tagged.
Notes16TagLLM ArchitectureOldest10 Apr 2026Newest23 May 2026
Every article tagged llm architecture, newest first.
| Title | Summary | Date |
|---|---|---|
| Agent-Native Infrastructure | The world is still built for humans and must be rewritten for agents; "what do I copy-paste to my agent?"; sensors/actuators; agent-to-agent representation | |
| Agentic Loops Overtake Bespoke Systems | DeepMind's *basic* Ralph-loop agent matched its bespoke evolutionary+AlphaProof system as the LLM improved; the bitter lesson / harness-shrinkage confirmed in formal math | |
| Jagged Intelligence (Ghosts, Not Animals) | "Ghosts not animals": jagged statistical circuits, no intrinsic motivation; car-wash/strawberry failures; stay in the loop, treat as tools | |
| Software 3.0 | Karpathy's taxonomy: 1.0 code, 2.0 weights, 3.0 prompting; LLM as programmable interpreter; MenuGen "shouldn't exist"; neural-net-as-host-process extrapolation | |
| The Verifiability Thesis | LLMs automate what you can *verify* as computers automate what you can *specify*; RL verification rewards → jagged peaks; "verifiable + labs care"; everything eventually verifiable | |
| Encoder-Free Early Fusion | Multimodal design with minimal pre-processing instead of large standalone encoders: dMel audio embedding, 40×40-patch hMLP for frames, flow head for audio out, all co-trained from scratch in one transformer | |
| Interaction / Background Model Split | Dual-model architecture: time-aware interaction model stays present; async background model handles deep reasoning/tools; rich-context-package delegation; "reasoning-model planning at non-thinking latency" | |
| Interaction Models | Thinking Machines Lab (May 2026): models that handle audio/video/text interaction natively in real time instead of via harness; interactivity scales with intelligence only if it's in the model | |
| The Bitter Lesson | Sutton 2019: scaled general methods beat hand-engineered structure; recurring justification across the wiki for dissolving harnesses into models; caveat — mechanical verification and character may not migrate inward | |
| Time-Aligned Micro-Turns | The core interaction-model move: input/output as continuous streams in ~200ms interleaved chunks, no turn boundaries; streaming-sessions inference (upstreamed to SGLang), latency-tuned MoE kernels, bitwise trainer-sampler alignment | |
| Turn-Based Interface Bottleneck | Why current AI interfaces limit collaboration: single-thread turn-taking is a bandwidth bottleneck; humans pushed out by the interface, not the work; less-intelligent harness (VAD/turn-detection) should dissolve | |
| Context Window Smart Zone | Smart zone vs dumb zone (Dex Hardy / Matt Pocock): quadratic attention scaling, ~100K marker independent of advertised context; clear-and-restart > compaction; status-line token counting as essential discipline | |
| Harness Shrinkage as Models Improve | Prompt scaffolding shrinks each model release; Cat Wu's pruning discipline; Boris Cherny "100 lines of code a year from now" claim; mechanical verification stays load-bearing | |
| Client-Side Agent Optimization | AgentOpt's framing of developer-controlled agent optimization (model-per-role, budget, routing) as distinct from server-side serving; the combo abstraction; 13–32× cost gaps between best/worst combinations | |
| Agent Harness Engineering | Patterns for scaffolding long-running LLM agents: environment design, progressive context disclosure, mechanical architecture enforcement, agent code review | |
| LLM-as-Compiler Knowledge Base | Karpathy's architecture: LLM incrementally compiles raw docs into a persistent interlinked wiki, replacing RAG with a 4-phase ingest→compile→query→lint pipeline |