Sources#
Summary#
Part V of Zero Trust for AI Agents: securing the agents you deploy is only half the work — the other half is running security operations fast enough to contend with attackers who are themselves AI-accelerated (AI-Accelerated Offense). When exploits appear within hours of a patch, response processes that take days are too slow; agentic adversaries might attack thousands of systems in the time a human reviews one alert. The governing principle mirrors the incident-response rule from elsewhere in the framework: move humans off the bookkeeping and onto the decisions.
The core rule: automate bookkeeping, not decisions#
The answer is not to remove humans from the loop. Automate evidence collection, enrichment, correlation, and documentation; keep humans on containment calls, disclosure calls, and customer-comms calls. Human decision speed during an incident should never be rate-limited on evidence collection or write-ups. (This is the defensive twin of the broader Zero Trust for AI Agents automated-response rule: models take notes, capture artifacts, draft the postmortem; humans make the calls.)
Concrete practices#
- Put a model at the front of the alert queue — every inbound alert gets an automated first-pass investigation before a human sees it. A triage agent with read-only SIEM access and well-scoped query tools directs analyst attention. Practical start: pick one noisy rule, wire a frontier model into its stream read-only, measure agreement against a human reviewer for two weeks, expand only if tolerable. Don't automate the whole queue at once.
- Agentic SOAR — the next generation of Security Orchestration, Automation & Response: adaptive capabilities beyond fixed playbooks, responding to novel AI-driven attacks within seconds (quarantine, dynamic access-control adjustment, session termination, credential revocation — executed through the identity-based isolation and short-lived-credential infrastructure of Agent Identity and Authentication).
- Map detection coverage against MITRE ATT&CK — know which techniques you can and can't detect (more useful than a vague "improve detection" goal); prioritize lateral movement and credential access, where AI-accelerated attackers get the most leverage from compromised agent identities. Atomic Red Team gives a one-afternoon coverage map.
- Rehearse five simultaneous incidents, not one — the standard one-CVE tabletop doesn't scale; plan for an order-of-magnitude increase in finding volume.
- Pre-authorize emergency change procedures — decide in advance who can take a service offline / rotate a credential / block a path, how fast, and on what evidence; practice the path so it isn't improvised mid-incident.
Defensive agents need Zero Trust too#
Agentic SOAR's blast radius is significant, so the same Zero Trust principles apply to defensive agents: verified integrity (hardened environments), limited blast radius (least privilege, scoped automated responses), clear escalation paths (high-impact responses require human approval even when recommended automatically), and full logging/tracing/review. "Organizations should not blindly trust defensive automation any more than they trust other autonomous systems" — this is Blast Radius (Agentic) and Least Agency turned inward on the security tooling itself.
Connections#
- Zero Trust for AI Agents — Part V (hub)
- AI-Accelerated Offense — the threat that forces defense to operate at machine speed
- Agent Identity and Authentication — the infrastructure (identity-based isolation, short-lived credentials) that automated responses execute through
- Blast Radius (Agentic) / Least Agency — applied inward on defensive agents themselves
- Claude Code Auto Mode — classifier-gated triage at the action boundary is a deployed instance of "a model at the front of the queue"
- LLM-Driven Vulnerability Research — the same model capability, used by the defender for triage/hunting/artifact-capture rather than exploitation
Open Questions#
- "Measure agreement against a human for two weeks, expand if tolerable" — what agreement threshold is tolerable, and who owns the residual false-negative risk when the model dispositions an alert the human never sees?
- Defensive agents are high-value targets (compromising one yields powerful capabilities). Does concentrating detection in an Agentic SOAR create a single point of catastrophic compromise the distributed-human model didn't have?
Sources#
- Zero Trust for AI Agents — Part V, "Defensive operations at the speed of autonomous threats"
Cited by 8
- Agent Identity and Authentication
The foundation control for agentic Zero Trust: cryptographically-rooted per-agent identity (→X.509→hardware attestation…
- AI-Accelerated Offense
Frontier models compress the vulnerability-to-exploit timeline from months to hours at marginal dollar cost; both attac…
- Blast Radius (Agentic)
The potential damage if an agent is compromised; the unit Zero Trust's 'assume breach' posture is built to contain via…
- Claude Code Auto Mode
Claude Code permission mode using a classifier to auto-approve safe tool calls and block risky ones; middle ground betw…
- Least Agency
OWASP term extending least privilege to agents: constrain not just what an agent can access but what each tool can do,…
- LLM-Driven Vulnerability Research
Claude Mythos Preview's emergent cybersecurity capabilities: autonomous zero-day discovery, full exploit chains, and An…
- MOC — AI Engineering & Agent Tooling
<!-- BEGIN GENERATED: moc -->
- Zero Trust for AI Agents
Anthropic's security framework for deploying autonomous agents: trust nothing / verify everything / assume breach, appl…
Related articles
- Impossible, Not Tedious (Design Test)
Zero Trust design test for agentic security: does a control make the attack impossible, or just tedious? Friction-only…
- Zero Trust for AI Agents
Anthropic's security framework for deploying autonomous agents: trust nothing / verify everything / assume breach, appl…
- Agent Supply Chain Risk
Runtime-composed agent ecosystems expand the supply-chain attack surface: model poisoning (250 docs backdoor a 13B mode…
- MCP and Computer Use
Anthropic's two complementary connector mechanisms: MCP for structured programmatic access (Salesforce/Drive/Gmail/Slac…
- Blast Radius (Agentic)
The potential damage if an agent is compromised; the unit Zero Trust's 'assume breach' posture is built to contain via…
