Sources#
Summary#
Claude Sonnet 5 is Anthropic's "most agentic Sonnet yet" (announced July 2, 2026), a direct upgrade to Sonnet 4.6 that makes plans, uses tools (browsers, terminals), and runs autonomously "at a level that, just a few months ago, required larger and more expensive models." Anthropic's positioning: the Sonnet class started the agentic era (3.5 / 3.6 / 3.7 were the first models with strong coding + tool use), but recent agentic gains had concentrated in the Opus class — Sonnet 5 narrows that gap, landing close to Opus 4.8 at lower prices. API model id: claude-sonnet-5. As a first-party release announcement this is a vendor-claim source; benchmark deltas below are Anthropic's own and the fuller evaluation lives in the Claude Sonnet 5 System Card.
Pricing and identity#
- Introductory pricing (through Aug 31, 2026): $2 / M input tokens, $10 / M output tokens.
- Standard pricing (from Sep 1, 2026): $3 / M input, $15 / M output — vs Opus 4.8 at $5 / $25.
- API model id
claude-sonnet-5; the default model for Free and Pro plans, and available to Max, Team, Enterprise, in Claude Code, and on the Claude Platform. - Rate limits raised across Chat, Cowork, Claude Code, and the Claude Platform to accommodate higher-effort token usage.
Capability profile#
The headline claim is cost-performance range, not a single peak. Anthropic frames Sonnet 5 against Sonnet 4.6 (its predecessor) and Opus 4.8 (a more capable reference model) on two agentic evals — BrowseComp (agentic search) and OSWorld-Verified (computer use):
- Strict improvement over Sonnet 4.6 across reasoning, tool use, coding, and knowledge work.
- Wider cost-performance range than Opus 4.8, tunable via the effort parameter (up to
xhigh): substantially better cost-efficiency at medium effort, and higher-effort runs "can match Opus 4.8 on some tasks." The pitch is that between Sonnet 5 and Opus 4.8 a user dials effort to find the right cost/performance balance — a first-party instance of the effort/budget lever in Client-Side Agent Optimization. - Early-access partners reported it "finishes complex tasks where previous Sonnet models would stop short" and "checks its own output without explicitly being asked" — spontaneous self-verification, a capability the shrinking-harness thesis predicts (the verification scaffold the harness used to supply migrates into the model).
(The head-to-head benchmark table vs Sonnet 4.6 and Opus 4.8 is published only as an image in the source and is not transcribed here.)
Benchmark errata (methodology, not model): a June 30, 2026 changelog corrected the BrowseComp cost-performance chart to the standard agentic-search methodology (10M-token budget with compaction + programmatic tool calling), which had underestimated Sonnet 5. Sonnet 4.6 baselines were also restated: Humanity's Last Exam re-graded to 34.6% (no tools) / 46.8% (tools), and OSWorld-Verified to 78.5%, after eval-methodology changes — so numbers differ from the Sonnet 4.6 launch blog.
Token-economics (migration hazard)#
Sonnet 5 uses an updated tokenizer — the same kind of change Opus 4.7 introduced — so the same input maps to roughly 1.0–1.35× more tokens depending on content type. Anthropic set the introductory pricing so the transition from Sonnet 4.6 is "roughly cost-neutral." As with Opus 4.7, real-traffic token inflation is content-dependent and worth measuring rather than assuming (cross-reference the context-window-as-primary-constraint theme in Claude Code Best Practices).
Safety and alignment profile#
Anthropic's pre-deployment evaluations report Sonnet 5 as an overall improvement on Sonnet 4.6:
- Agentic safety: better at refusing malicious requests and resisting hijack attempts in prompt-injection attacks.
- Honesty: lower rates of hallucination and sycophancy than Sonnet 4.6.
- Automated behavioral audit (cooperation-with-misuse, deception, and other misaligned behaviors across many contexts): Sonnet 5 scored lower — i.e. safer — overall than Sonnet 4.6, but higher (worse) than the more capable Opus 4.8 and Claude Mythos Preview. This is the mirror image of the usual worry: here the more capable models are better aligned on the audit, and Sonnet 5's residual misalignment is a mid-tier-capability artifact, not a frontier one.
Cyber capabilities and safeguards#
Sonnet 5 was not deliberately trained on cybersecurity tasks (contrast Opus 4.7, whose cyber capability was differentially reduced during training). It can do routine, non-harmful cyber work, but performs "substantially poorer" than Opus 4.8 and Mythos 5 on dangerous tasks like exploit development.
- Firefox exploit eval (built with Mozilla; all vulnerabilities patched in Firefox 148): both Sonnet models scored 0.0% at developing a working exploit. Sonnet 5 showed a slightly higher partial-success rate than Sonnet 4.6 — which Anthropic attributes to general-intelligence gains, not cyber-specific training.
- Default safeguards on. Because Sonnet 5 is somewhat stronger than 4.6 here, it launches with the same real-time cyber safeguards as Opus 4.7 and 4.8 (detect-and-block prohibited/high-risk cyber usage at inference). Judged low-risk, these are less strict than the safeguards launched with Fable 5 (which block a much wider range of cyber tasks and fall back to a weaker model). Legitimate security researchers route through the Cyber Verification Program; Anthropic recommends Opus 4.8 for cyber work needing reduced guardrails.
This places Sonnet 5 as a distinct point on the safeguard spectrum mapped in Capability-Gated Model Fallback: no deliberate train-down (its low cyber capability is native), inference-time detect-and-block at the Opus-4.7/4.8 strictness level, and no model-swap fallback — narrower than Fable 5's classifier-plus-fallback regime because the underlying uplift risk is judged low.
Availability#
Available everywhere from launch (July 2, 2026): default for Free/Pro, available to Max/Team/Enterprise, in Claude Code, and on the Claude Platform (native, AWS, Microsoft Foundry; Google Vertex "coming soon" for the Cyber Verification Program). API id claude-sonnet-5.
Connections#
- Claude Opus 4.8 — the capability ceiling Sonnet 5 is measured against: "close to Opus 4.8 at lower prices," matching it at higher effort on some tasks, and safer-than-Sonnet-5 on the behavioral audit; the model Anthropic recommends over Sonnet 5 for reduced-guardrail cyber work
- Claude Opus 4.7 — precedent for the two migration-relevant changes: the 1.0–1.35× tokenizer inflation and the default real-time cyber safeguards Sonnet 5 inherits
- Claude Fable 5 — the stricter end of the safeguard spectrum; Sonnet 5's cyber safeguards are explicitly "less strict than those launched with Fable 5"
- Claude Mythos 5 — cyber-capability reference point Sonnet 5 falls far short of
- Mythos Model — Mythos Preview is the best-aligned reference on the behavioral audit that Sonnet 5 trails
- Anthropic — vendor
- Claude Code — primary agentic runtime; Sonnet 5 ships as an available model there at launch
- Capability-Gated Model Fallback — the safeguard-spectrum framing Sonnet 5 adds a low-risk, no-fallback point to
- Client-Side Agent Optimization — Sonnet 5's effort-level cost-performance tuning is a first-party instance of the model-per-role / budget / routing lever
- Harness Shrinkage as Models Improve — "checks its own output without being asked" is verification scaffolding migrating from harness into model
- Agentic Prompt Injection — improved hijack-resistance is a headline agentic-safety gain
- Automated Behavioral Audit — the alignment evaluation Sonnet 5 is scored on (safer than 4.6, worse than Opus 4.8 / Mythos Preview)
- LLM-Driven Vulnerability Research — the cyber-capability axis Sonnet 5 is deliberately weak on; the Firefox exploit eval is the worked instance
- Responsible Scaling Policy Evaluations — Sonnet 5's pre-deployment safety/capability evals and its low-risk cyber determination
Open questions#
- The head-to-head benchmark numbers vs Sonnet 4.6 and Opus 4.8 are image-only in the source; the System Card has the full set.
- What is the real-world token-inflation multiplier on typical Sonnet 5 traffic (1.0–1.35× is content-dependent), and does "roughly cost-neutral" hold once effort levels rise?
- Why does a mid-tier model show higher behavioral-audit misalignment than the more capable Opus 4.8 and Mythos Preview — a capability-alignment coupling, or a training-recipe difference between the Sonnet and Opus/Mythos lines?
- At what effort level does Sonnet 5 actually match Opus 4.8, and how does the crossover cost compare to just running Opus 4.8?
Sources#
- Introducing Claude Sonnet 5 — Anthropic, "Introducing Claude Sonnet 5" (July 2, 2026; changelog edit June 30, 2026).
evidence: vendor-claim
Cited by 14
- Agentic Prompt Injection
Direct and indirect injection of malicious instructions into an agent; LLMs cannot reliably distinguish information fro…
- Anthropic
AI safety company / vendor of Claude; mission-as-tiebreaker culture; ~30–40 PMs across teams; Mike Krieger leads Labs r…
- Automated Behavioral Audit
Anthropic's broad-coverage alignment evaluation: an investigator model probes a target across ~1,300 handwritten scenar…
- Capability-Gated Model Fallback
Fable 5's safeguard architecture: classifiers detect cyber / bio-chem / distillation queries and route the response to…
- Claude Code
Anthropic's agentic coding product; created by Boris Cherny late 2024; TypeScript/React; CLI/desktop/web/mobile/IDE sur…
- Claude Fable 5
Anthropic's first generally-available Mythos-class model (June 2026) — state-of-the-art on nearly all benchmarks; the s…
- Claude Mythos 5
The safeguards-lifted form of Claude Fable 5 (June 2026): same underlying Mythos-class model, deployed through Project…
- Claude Opus 4.7
GA frontier model from Anthropic; direct upgrade to 4.6 at same price; literal instruction following, 1.0–1.35× tokeniz…
- Claude Opus 4.8
Anthropic's most capable general-access model (May 2026); upgrade on Opus 4.7 in SWE/agentic/knowledge work; does not a…
- Client-Side Agent Optimization
AgentOpt's framing of developer-controlled agent optimization (model-per-role, budget, routing) as distinct from server…
- LLM-Driven Vulnerability Research
Claude Mythos Preview's emergent cybersecurity capabilities: autonomous zero-day discovery, full exploit chains, and An…
- Entities — People, Orgs, Tools & Projects
Map of Content for all 39 entity pages. See Home for concept domains.
- Mythos Model
Anthropic preview-tier frontier model and the first member of the Mythos-class tier (above Opus); gated for safety, use…
- Responsible Scaling Policy Evaluations
Anthropic's RSP gates deployment on pre-release capability evaluations in CBRN, automated AI R&D, and high-stakes misal…
Related articles
- Mythos Model
Anthropic preview-tier frontier model and the first member of the Mythos-class tier (above Opus); gated for safety, use…
- Anthropic
AI safety company / vendor of Claude; mission-as-tiebreaker culture; ~30–40 PMs across teams; Mike Krieger leads Labs r…
- Claude Opus 4.8
Anthropic's most capable general-access model (May 2026); upgrade on Opus 4.7 in SWE/agentic/knowledge work; does not a…
- Capability-Gated Model Fallback
Fable 5's safeguard architecture: classifiers detect cyber / bio-chem / distillation queries and route the response to…
- LLM-Driven Vulnerability Research
Claude Mythos Preview's emergent cybersecurity capabilities: autonomous zero-day discovery, full exploit chains, and An…
