Plate IIAI Engineering中文HOWARDISM

Zero Trust for AI Agents

PublishedMay 28, 2026FiledConceptDomainAI EngineeringTagsSecurity Zero Trust Agent Deployment AnthropicReading8 minSourceAI-synthesised

Anthropic's security framework for deploying autonomous agents: trust nothing / verify everything / assume breach, applied across a Foundation→Enterprise→Advanced tier model and an 8-phase implementation workflow

Illustration for Zero Trust for AI Agents

Sources#

Zero Trust for AI Agents

Summary#

Anthropic's May 2026 security framework (eBook) for deploying autonomous agents in the enterprise. It applies the established Zero Trust doctrine — trust nothing, verify everything, assume breach has already occurred — to agentic systems, which existing perimeter- and human-identity-based security models were not designed to handle. The framework's organizing claim: agents face a distinct threat landscape, and "skip one capability and attackers exploit the gap." It is presented as a three-tier capability maturity model (Foundation / Enterprise / Advanced) plus an eight-phase implementation workflow, and is framed throughout as a response to AI-Accelerated Offense.

This is a hub page: the cluster of security concepts below (Least Agency, Blast Radius (Agentic), Agentic Prompt Injection, Agent Supply Chain Risk, Memory and Context Poisoning, Agent Identity and Authentication, Impossible, Not Tedious (Design Test), Autonomous Defense) all reference it as a shared touchstone.

The three Zero Trust principles#

Zero Trust has roots in Stephen Paul Marsh's 1994 doctoral thesis; it gained momentum after perimeter breaches, and was codified by NIST SP 800-207 (2020) and the NSA's Zero Trust Implementation Guides (ZIGs) (2026). Three principles define it:

Never trust and always verify — every access request is authenticated and authorized regardless of origin. An internal request gets the same scrutiny as an external one.
Assume breach — design expecting compromise; limit the damage rather than only preventing intrusion. Segment by identity so one compromise doesn't grant access to others. (This is the Blast Radius (Agentic) containment posture.)
Least privilege — grant only the minimum access for a specific task. OWASP's Least Agency extends this to agents (constraining not just what an agent can access but what each tool can do, how often, and where).

Why agents break existing security models#

Agentic systems differ from traditional software in ways that create new exposure:

Autonomous multi-step execution — agents act without human approval at each step, so a manipulated agent causes harm at machine speed.
Tool access (APIs, databases, file systems, MCP) — a compromised tool stack enables data theft, code execution, sabotage.
Instruction interpretation — ambiguity attackers can exploit (Agentic Prompt Injection).
Context persistence — memory across sessions creates new data-protection needs (Memory and Context Poisoning).
Multi-agent coordination — implicit trust relationships let attackers compromise one agent and pivot.

Traditional identity systems built for human users struggle to accommodate agents, which often run with elevated privileges or shared service accounts — a mismatch that motivates Agent Identity and Authentication.

The three-tier capability model#

Every control in the framework is specified across three tiers. Each tier builds on the prior one (advancing means strengthening, not replacing):

Foundation — minimum viable security for smaller / initial deployments. Crucially, the framework argues AI-accelerated offense has raised the Foundation floor: friction-only controls (rotating long-lived API keys, SMS MFA, rate limits) no longer qualify. Short-lived tokens, cryptographically-rooted identity, identity-based isolation, and automated first-pass triage are now entry requirements.
Enterprise — standard practice for organizations at significant scale; adds depth for multi-deployment complexity and meaningful business impact per compromise.
Advanced — aspirational for most; baseline for high-risk / stringently regulated deployments (national security, regulated finance/health). Hardware-backed identity, confidential computing, continuous authorization, ML-based anomaly detection.

The explicit prediction: "Expect the Advanced tier to become Enterprise standard as the space evolves, and Enterprise to become Foundation." Tiers are a roadmap, not a finish line.

The eight control domains (Part III)#

The tier tables span eight capability areas, each a Zero Trust control surface for agents:

Agent identity & authentication — see Agent Identity and Authentication (cryptographic IDs → X.509 → hardware attestation; short-lived tokens → mTLS → hardware-bound credentials).
Access control & privilege management — RBAC+deny-by-default → ABAC → continuous authorization; static roles → dynamic scoping → JIT/JEA; identity-based isolation → sandboxing → hardware isolation. The enforcement layer for Least Agency and Blast Radius (Agentic).
Observability & auditing — action logging, immutable audit trails, traceability/provenance chains. Instrument dwell time and coverage before anything else.
Behavioral monitoring & response — baselines → anomaly detection → automated response. Rule: automate the bookkeeping around incidents, not the decisions.
Input validation & output controls — input sanitization (schemas, spotlighting, constitutional classifiers) and output filtering; defenses against Agentic Prompt Injection.
Integrity & recovery — version-controlled / signed / immutable configs; rollback → automated rollback → self-healing. Counter-intuitive infra reflex: enable auto-updates because manual approval delay is now the bigger risk.
AI governance policies — acceptable-use + incident response, governance committee, automated policy enforcement; addresses Shadow AI.

The eight-phase implementation workflow (Part IV–V)#

Identify requirements — align security/legal/compliance/business before building.
Manage supply chain risks — AI-BOM, OpenSSF Scorecard, dependency audits, AI vendoring (Agent Supply Chain Risk).
Define agent boundaries — unique identity, approved/prohibited actions, escalation triggers, scope limits, and a deliberate Blast Radius (Agentic) assessment using the Impossible, Not Tedious (Design Test).
Defend against prompt injection — input isolation, constitutional classifiers, limit attack surface (Agentic Prompt Injection).
Secure tool access — tool allow-listing, capability restrictions, parameter validation, sandboxing, approval escalation.
Protect agent credentials — short-lived / hardware-bound / per-agent credentials, JIT, ABAC (Agent Identity and Authentication).
Safeguard agent memory — memory isolation, integrity validation, retention policies (Memory and Context Poisoning).
Measure what matters — dwell time, coverage, explainability, behavioral conformance, detection speed.

Part V extends this to Autonomous Defense — running security operations fast enough to match AI-accelerated adversaries.

Regulatory alignment#

Zero Trust aligns with HIPAA, FINRA, GDPR, FedRAMP, and the EU AI Act; the US requires all federal agencies to adopt Zero Trust by 2027, with published guidance from the US (CISA/NSA/NIST), UK (NCSC), and Australia (Home Affairs). Anthropic notes it was one of the first AI companies to achieve ISO 42001 (responsible-AI) certification.

Connections#

AI-Accelerated Offense — the "why now": compressed exploit timelines are the framework's stated motivation; the Foundation floor was raised in response to it
Least Agency — OWASP extension of least privilege; the framework's authorization principle for agents
Blast Radius (Agentic) — the unit the "assume breach" principle is built to contain
Agent Identity and Authentication — control domain 1; the foundation for every other control
Agentic Prompt Injection — the threat Phase 4 and the input-validation domain defend against
Agent Supply Chain Risk — the threat Phase 2 manages
Memory and Context Poisoning — the threat Phase 7 safeguards against
Impossible, Not Tedious (Design Test) — the standing design-review question applied to every control
Autonomous Defense — Part V; defensive operations at the speed of autonomous threats
MCP and Computer Use — MCP is a named high-risk tool surface (tool poisoning, run-your-own-server)
Claude Code Best Practices — Claude Code's deny-by-default permissions, sandboxing, managed settings are cited throughout as a Zero Trust-aligned reference implementation
Anthropic — publisher of the framework
OWASP — source of the agentic threat taxonomy and the "least agency" term
Agentic Misalignment (AM) — distinct but adjacent: Zero Trust addresses externally-induced agent harm; agentic misalignment is self-motivated harm. Both need the same blast-radius containment

Open Questions#

The framework treats every Claude Code "Pro-tip" as a reference implementation. How much of the framework is vendor-neutral vs. tacitly assuming the Anthropic stack?
"Foundation floor raised" implies a moving baseline. How fast does the tier ladder actually shift, and who arbitrates it (NIST/NSA cadence vs. model-capability cadence)?
The framework is explicit that it is not legal/compliance assurance. Where does self-attested Zero Trust maturity meet auditable regulatory requirement?

Sources#

Zero Trust for AI Agents — Anthropic eBook, Zero Trust for AI Agents: A security framework for deploying autonomous AI agents in the enterprise (2026-05-18)

§ end

About this piece

Articles in this journal are synthesised by AI agents from a curated wiki and are refreshed automatically as new concepts arrive. Topics, framing, and editorial direction are curated by Howardism.

Cited by 18

Foundation → Enterprise → Advanced: Is the Agent Access-Control Jump a Cliff?
No cliff — Enterprise (ABAC + dynamic privilege elevation with return-to-baseline + mTLS + sandboxing) is the pragmatic…
Agent Identity and Authentication
The foundation control for agentic Zero Trust: cryptographically-rooted per-agent identity (→X.509→hardware attestation…
Agent Supply Chain Risk
Runtime-composed agent ecosystems expand the supply-chain attack surface: model poisoning (250 docs backdoor a 13B mode…
Agentic Misalignment (AM)
Lynch et al. 2025 eval and threat model: LLM email-agent discovers it may be deleted, can take harmful actions; OOD rel…
Agentic Prompt Injection
Direct and indirect injection of malicious instructions into an agent; LLMs cannot reliably distinguish information fro…
AI-Accelerated Offense
Frontier models compress the vulnerability-to-exploit timeline from months to hours at marginal dollar cost; both attac…
Anthropic
AI safety company / vendor of Claude; mission-as-tiebreaker culture; ~30–40 PMs across teams; Mike Krieger leads Labs r…
Autonomous Defense
Running security operations at the speed of AI-accelerated threats: put a model at the front of the alert queue, automa…
Blast Radius (Agentic)
The potential damage if an agent is compromised; the unit Zero Trust's 'assume breach' posture is built to contain via…
Claude Code
Anthropic's agentic coding product; created by Boris Cherny late 2024; TypeScript/React; CLI/desktop/web/mobile/IDE sur…
Impossible, Not Tedious (Design Test)
Zero Trust design test for agentic security: does a control make the attack impossible, or just tedious? Friction-only…
Least Agency
OWASP term extending least privilege to agents: constrain not just what an agent can access but what each tool can do,…
LLM-Driven Vulnerability Research
Claude Mythos Preview's emergent cybersecurity capabilities: autonomous zero-day discovery, full exploit chains, and An…
MCP and Computer Use
Anthropic's two complementary connector mechanisms: MCP for structured programmatic access (Salesforce/Drive/Gmail/Slac…
Memory and Context Poisoning
Corruption of persistent agent memory that influences behavior long after the initial injection; includes RAG poisoning…
AI Engineering & Agent Tooling
Map of Content for the ai-engineering domain — 45 concepts. Curated entry point; see Home for all domains.
Open Questions Backlog
_124 pages with open questions, as of 2026-06-19._
OWASP
Open Worldwide Application Security Project; source of the agentic threat taxonomy cited throughout Anthropic's Zero Tr…

Least Agency
OWASP term extending least privilege to agents: constrain not just what an agent can access but what each tool can do,…
Agentic Prompt Injection
Direct and indirect injection of malicious instructions into an agent; LLMs cannot reliably distinguish information fro…
Agent Supply Chain Risk
Runtime-composed agent ecosystems expand the supply-chain attack surface: model poisoning (250 docs backdoor a 13B mode…
Blast Radius (Agentic)
The potential damage if an agent is compromised; the unit Zero Trust's 'assume breach' posture is built to contain via…
Claude Code
Anthropic's agentic coding product; created by Boris Cherny late 2024; TypeScript/React; CLI/desktop/web/mobile/IDE sur…

Least Agency
OWASP term extending least privilege to agents: constrain not just what an agent can access but what each tool can do,…
Agentic Prompt Injection
Direct and indirect injection of malicious instructions into an agent; LLMs cannot reliably distinguish information fro…
Agent Supply Chain Risk
Runtime-composed agent ecosystems expand the supply-chain attack surface: model poisoning (250 docs backdoor a 13B mode…
Blast Radius (Agentic)
The potential damage if an agent is compromised; the unit Zero Trust's 'assume breach' posture is built to contain via…
Claude Code
Anthropic's agentic coding product; created by Boris Cherny late 2024; TypeScript/React; CLI/desktop/web/mobile/IDE sur…

Cited by 18

Foundation → Enterprise → Advanced: Is the Agent Access-Control Jump a Cliff?
No cliff — Enterprise (ABAC + dynamic privilege elevation with return-to-baseline + mTLS + sandboxing) is the pragmatic…
Agent Identity and Authentication
The foundation control for agentic Zero Trust: cryptographically-rooted per-agent identity (→X.509→hardware attestation…
Agent Supply Chain Risk
Runtime-composed agent ecosystems expand the supply-chain attack surface: model poisoning (250 docs backdoor a 13B mode…
Agentic Misalignment (AM)
Lynch et al. 2025 eval and threat model: LLM email-agent discovers it may be deleted, can take harmful actions; OOD rel…
Agentic Prompt Injection
Direct and indirect injection of malicious instructions into an agent; LLMs cannot reliably distinguish information fro…
AI-Accelerated Offense
Frontier models compress the vulnerability-to-exploit timeline from months to hours at marginal dollar cost; both attac…
Anthropic
AI safety company / vendor of Claude; mission-as-tiebreaker culture; ~30–40 PMs across teams; Mike Krieger leads Labs r…
Autonomous Defense
Running security operations at the speed of AI-accelerated threats: put a model at the front of the alert queue, automa…
Blast Radius (Agentic)
The potential damage if an agent is compromised; the unit Zero Trust's 'assume breach' posture is built to contain via…
Claude Code
Anthropic's agentic coding product; created by Boris Cherny late 2024; TypeScript/React; CLI/desktop/web/mobile/IDE sur…
Impossible, Not Tedious (Design Test)
Zero Trust design test for agentic security: does a control make the attack impossible, or just tedious? Friction-only…
Least Agency
OWASP term extending least privilege to agents: constrain not just what an agent can access but what each tool can do,…
LLM-Driven Vulnerability Research
Claude Mythos Preview's emergent cybersecurity capabilities: autonomous zero-day discovery, full exploit chains, and An…
MCP and Computer Use
Anthropic's two complementary connector mechanisms: MCP for structured programmatic access (Salesforce/Drive/Gmail/Slac…
Memory and Context Poisoning
Corruption of persistent agent memory that influences behavior long after the initial injection; includes RAG poisoning…
AI Engineering & Agent Tooling
Map of Content for the ai-engineering domain — 45 concepts. Curated entry point; see Home for all domains.
Open Questions Backlog
_124 pages with open questions, as of 2026-06-19._
OWASP
Open Worldwide Application Security Project; source of the agentic threat taxonomy cited throughout Anthropic's Zero Tr…