Autonomous Defense

Sources#

Zero Trust for AI Agents

Summary#

Part V of Zero Trust for AI Agents: securing the agents you deploy is only half the work — the other half is running security operations fast enough to contend with attackers who are themselves AI-accelerated (AI-Accelerated Offense). When exploits appear within hours of a patch, response processes that take days are too slow; agentic adversaries might attack thousands of systems in the time a human reviews one alert. The governing principle mirrors the incident-response rule from elsewhere in the framework: move humans off the bookkeeping and onto the decisions.

The core rule: automate bookkeeping, not decisions#

The answer is not to remove humans from the loop. Automate evidence collection, enrichment, correlation, and documentation; keep humans on containment calls, disclosure calls, and customer-comms calls. Human decision speed during an incident should never be rate-limited on evidence collection or write-ups. (This is the defensive twin of the broader Zero Trust for AI Agents automated-response rule: models take notes, capture artifacts, draft the postmortem; humans make the calls.)

Concrete practices#

Put a model at the front of the alert queue — every inbound alert gets an automated first-pass investigation before a human sees it. A triage agent with read-only SIEM access and well-scoped query tools directs analyst attention. Practical start: pick one noisy rule, wire a frontier model into its stream read-only, measure agreement against a human reviewer for two weeks, expand only if tolerable. Don't automate the whole queue at once.
Agentic SOAR — the next generation of Security Orchestration, Automation & Response: adaptive capabilities beyond fixed playbooks, responding to novel AI-driven attacks within seconds (quarantine, dynamic access-control adjustment, session termination, credential revocation — executed through the identity-based isolation and short-lived-credential infrastructure of Agent Identity and Authentication).
Map detection coverage against MITRE ATT&CK — know which techniques you can and can't detect (more useful than a vague "improve detection" goal); prioritize lateral movement and credential access, where AI-accelerated attackers get the most leverage from compromised agent identities. Atomic Red Team gives a one-afternoon coverage map.
Rehearse five simultaneous incidents, not one — the standard one-CVE tabletop doesn't scale; plan for an order-of-magnitude increase in finding volume.
Pre-authorize emergency change procedures — decide in advance who can take a service offline / rotate a credential / block a path, how fast, and on what evidence; practice the path so it isn't improvised mid-incident.

Defensive agents need Zero Trust too#

Agentic SOAR's blast radius is significant, so the same Zero Trust principles apply to defensive agents: verified integrity (hardened environments), limited blast radius (least privilege, scoped automated responses), clear escalation paths (high-impact responses require human approval even when recommended automatically), and full logging/tracing/review. "Organizations should not blindly trust defensive automation any more than they trust other autonomous systems" — this is Blast Radius (Agentic) and Least Agency turned inward on the security tooling itself.

Connections#

Zero Trust for AI Agents — Part V (hub)
AI-Accelerated Offense — the threat that forces defense to operate at machine speed
Agent Identity and Authentication — the infrastructure (identity-based isolation, short-lived credentials) that automated responses execute through
Blast Radius (Agentic) / Least Agency — applied inward on defensive agents themselves
Claude Code Auto Mode — classifier-gated triage at the action boundary is a deployed instance of "a model at the front of the queue"
LLM-Driven Vulnerability Research — the same model capability, used by the defender for triage/hunting/artifact-capture rather than exploitation

Open Questions#

"Measure agreement against a human for two weeks, expand if tolerable" — what agreement threshold is tolerable, and who owns the residual false-negative risk when the model dispositions an alert the human never sees?
Defensive agents are high-value targets (compromising one yields powerful capabilities). Does concentrating detection in an Agentic SOAR create a single point of catastrophic compromise the distributed-human model didn't have?

Sources#

Zero Trust for AI Agents — Part V, "Defensive operations at the speed of autonomous threats"