AI Engineering & Agent Tooling#
Map of Content for the ai-engineering domain — 35 concepts. Curated entry point; see Home for all domains.
<!-- BEGIN GENERATED: moc -->
- Agent Harness Engineering — Patterns for scaffolding long-running LLM agents: environment design, progressive context disclosure, mechanical architecture enforcement, agent code review
- Agent Identity and Authentication — The foundation control for agentic Zero Trust: cryptographically-rooted per-agent identity (→X.509→hardware attestation), short-lived IdP-issued tokens replacing static API keys (→mTLS→hardware-bound credentials), JIT access and ABAC
- Agent Loop Pattern —
/loop(cron-scheduled) and Ralph Wiggum (backlog-draining) loops as next-generation agent primitive; AFK execution, parallel fan-out, "loops are the future" - Agent-Native Infrastructure — The world is still built for humans and must be rewritten for agents; "what do I copy-paste to my agent?"; sensors/actuators; agent-to-agent representation
- Agent Supply Chain Risk — Runtime-composed agent ecosystems expand the supply-chain attack surface: model poisoning (250 docs backdoor a 13B model), tool/MCP supply chain (first in-the-wild malicious MCP server), AI-BOM, OpenSSF Scorecard, dependency audits, and AI vendoring as remediation
- Agentic Prompt Injection — Direct and indirect injection of malicious instructions into an agent; LLMs cannot reliably distinguish information from instructions; defenses are spotlighting (50%→<2%), constitutional classifiers (95% blocked), input isolation, and attack-surface reduction
- AI-Accelerated Offense (hub) — Frontier models compress the vulnerability-to-exploit timeline from months to hours at marginal dollar cost; both attackers and defenders speed up, the N-day window collapses, and the differentiator becomes strong fundamentals + breach-ready architecture
- Autonomous Defense — Running security operations at the speed of AI-accelerated threats: put a model at the front of the alert queue, automate the bookkeeping (not the decisions), Agentic SOAR, MITRE ATT&CK coverage mapping, and rehearse five simultaneous incidents
- Blast Radius (Agentic) — The potential damage if an agent is compromised; the unit Zero Trust's 'assume breach' posture is built to contain via identity-based isolation, sandboxing, and compartmentalization
- Building Is Cheap, Arguing Is Expensive — "In technical debate, code wins": generate three PRs vs whiteboard; prototype over design doc; reduce design docs
- Claude Code Auto Mode — Claude Code permission mode using a classifier to auto-approve safe tool calls and block risky ones; middle ground between default and
--dangerously-skip-permissions - Claude Code Best Practices (hub) — Anthropic's guide to effective Claude Code usage: context management, verification-driven development, explore→plan→code workflow, environment config
- Client-Side Agent Optimization — AgentOpt's framing of developer-controlled agent optimization (model-per-role, budget, routing) as distinct from server-side serving; the combo abstraction; 13–32× cost gaps between best/worst combinations
- Code as Source of Truth — Docs go stale at high coding throughput; check specs/skills into the repo; onboard via Claude; spec-drift verification
- Codex App Server Protocol — JSON-RPC stdio protocol for headless Codex sessions: initialize/initialized/thread-start/turn-start handshake, continuation turns reuse thread_id, dynamic tool calls for token-isolated tool injection
- Compute Allocator — The human's evolving role: deciding what's worth spending compute on; ~1% of generated tokens ship, 99% is scaffolding invested in alignment/communication; abundance mindset
- Context Window Smart Zone (hub) — Smart zone vs dumb zone (Dex Hardy / Matt Pocock): quadratic attention scaling, ~100K marker independent of advertised context; clear-and-restart > compaction; status-line token counting as essential discipline
- Deep Modules for Agents — Ousterhout deep-vs-shallow modules applied to agent-friendly codebases; push-vs-pull instruction delivery; reviewer in fresh context; Sandcastle three-agent pattern
- Design Concept Grilling (hub) — Matt Pocock's
grill-meskill; reach Brooks "design concept" before any plan; counter to specs-to-code; PRD as destination doc, Kanban as journey doc - Disposable Micro-Apps — Throwaway custom UIs built per-task to edit a plan ("micro-software on top of micro-software"); copy-back-to-markdown; rational under the abundance mindset
- Harness Shrinkage as Models Improve (hub) — Prompt scaffolding shrinks each model release; Cat Wu's pruning discipline; Boris Cherny "100 lines of code a year from now" claim; mechanical verification stays load-bearing
- HTML as the New Markdown — Thariq Shihipar's thesis: as models improve, thousand-line markdown plans overwhelm the human; HTML artifacts (visual, interactive) keep humans in the loop. The model-facing harness shrinks while this human-facing harness grows
- Impossible, Not Tedious (Design Test) (hub) — Zero Trust design test for agentic security: does a control make the attack impossible, or just tedious? Friction-only controls degrade against agentic attackers with unlimited patience and near-zero per-attempt cost
- Least Agency — OWASP term extending least privilege to agents: constrain not just what an agent can access but what each tool can do, how often, and where; deny-by-default, per-agent credentials, scope limits
- Living Design System —
design_system.htmlextracted from repos as a portable, human- and machine-readable source of truth; component playgrounds; bridges engineering ↔ non-technical stakeholders - LLM-as-Compiler Knowledge Base — Karpathy's architecture: LLM incrementally compiles raw docs into a persistent interlinked wiki, replacing RAG with a 4-phase ingest→compile→query→lint pipeline
- MCP and Computer Use — Anthropic's two complementary connector mechanisms: MCP for structured programmatic access (Salesforce/Drive/Gmail/Slack/Figma + niche industry systems); computer use as the GUI-driving catchall when no MCP exists; Boris Cherny's "to the model, it's just tokens"
- Memory and Context Poisoning — Corruption of persistent agent memory that influences behavior long after the initial injection; includes RAG poisoning, shared-context poisoning, and slow long-term memory drift; defended via memory isolation, integrity validation, and retention policies
- Outsource Your Thinking, Not Your Understanding — "You can outsource your thinking but not your understanding"; understanding as the non-delegable human bottleneck; knowledge bases as understanding-tools
- Ticket-Driven Agent Orchestration — The inversion that makes Symphony work: tickets as units of work (not sessions/PRs), DAG dependencies, agent-extensible work graph, "objectives not transitions"
- The Verifiability Thesis (hub) — LLMs automate what you can verify as computers automate what you can specify; RL verification rewards → jagged peaks; "verifiable + labs care"; everything eventually verifiable
- Verification as the New Bottleneck (hub) — Fiona Fung: coding is no longer the bottleneck — verification, review, maintenance are; shift-left; TDD loses its tax; PR-cycle-time funnel analysis
- Vertical Slice Tracer Bullets — Pragmatic-Programmer tracer-bullet pattern applied to agent task decomposition; vertical slices > horizontal layers; Kanban-with-blocking-edges over numbered phase plans
- Vibe Coding vs. Agentic Engineering — Vibe coding raises the floor (anyone builds); agentic engineering preserves the quality bar while going faster; ">10x and widening"; hire on big projects, not puzzles
- Zero Trust for AI Agents (hub) — Anthropic's security framework for deploying autonomous agents: trust nothing / verify everything / assume breach, applied across a Foundation→Enterprise→Advanced tier model and an 8-phase implementation workflow <!-- END GENERATED: moc -->
§ end
Related articles
- Open Questions Backlog
_62 pages with open questions, as of 2026-05-25._
- Claude Code
Anthropic's agentic coding product; created by Boris Cherny late 2024; TypeScript/React; CLI/desktop/web/mobile/IDE sur…
- Agent Harness Engineering
Patterns for scaffolding long-running LLM agents: environment design, progressive context disclosure, mechanical archit…
- Claude Code Best Practices
Anthropic's guide to effective Claude Code usage: context management, verification-driven development, explore→plan→cod…
- Zero Trust for AI Agents
Anthropic's security framework for deploying autonomous agents: trust nothing / verify everything / assume breach, appl…
