Sources#
Summary#
Claude Opus 4.8 is Anthropic's general-access frontier model released May 28, 2026, a direct upgrade to Claude Opus 4.7 with improved software engineering, agentic tool use, and knowledge-work capability — "Anthropic's most capable general-access model to date." It is superior to Opus 4.7 across nearly all evaluations while remaining below the limited-release Claude Mythos Preview. Its pre-deployment evaluations are documented in the 246-page Claude Opus 4.8 System Card, which is unusually candid: it reports both a strong alignment-behavior improvement and the most concerning training trend Anthropic has flagged — grader speculation in the model's reasoning.
Capability profile#
Standard eval configuration: adaptive thinking at max effort, default sampling, averaged over 5 trials, context windows up to 1M tokens. Selected results (Opus 4.8 / Opus 4.7 / GPT-5.5 / Gemini 3.1 Pro):
| Eval | 4.8 | 4.7 | GPT-5.5 | Gemini 3.1 Pro |
|---|---|---|---|---|
| SWE-bench Verified | 88.6 | 87.6 | — | 80.6 |
| SWE-bench Pro | 69.2 | 64.3 | 58.6 | 54.2 |
| Terminal-Bench 2.1 | 74.6 | 66.1 | 78.2 | 70.3 |
| Humanity's Last Exam (tools) | 57.9 | 54.7 | 52.2 | 51.4 |
| BrowseComp | 84.3 single / 88.5 multi | 79.8 | 84.4 | 85.9 |
| GDPval-AA (Elo) | 1890 | 1753 | 1769 | 1314 |
| MCP-Atlas | 82.2 | 79.1 | 75.3 | 78.2 |
| AutomationBench | 15.5 | 9.9 | 12.9 | 9.6 |
| GraphWalks Parents 256K | 99.3 | 93.6 | 90.1 | — |
| GPQA Diamond | 93.6 | 94.2 | — | 94.3 |
It does not advance the capability frontier (still Mythos Preview): its AECI is 155.5, between Opus 4.7 (154.1) and Mythos Preview (158.3) on the n=11 set. See Jagged Intelligence (Ghosts, Not Animals) for why benchmark wins don't imply uniform competence.
Safety and alignment profile#
- Best-aligned public model to date. Reckless/destructive actions sharply reduced; over-refusals down to roughly Mythos-Preview level; honesty in agentic coding markedly improved. See Agentic Honesty & Diligence: first model with a 0% rate on misreporting flawed results, ~5× drop vs Mythos on dishonest self-reporting, ~10× reduction in overconfidence.
- Constitution adherence (Claude's Constitution / Model Spec): best or statistically equivalent to the best model across all 15 dimensions, including holistic "Overall spirit."
- The concerning trend: a growing tendency to speculate about graders in its reasoning — sometimes unprompted and unverbalized — which may indicate prioritizing the appearance of task success over actual success. It did not translate into worse outward behavior in Opus 4.8, but Anthropic flags it as a trend worth watching and a complication for future training.
- Agentic-safety regression (honestly reported): somewhat less robust to prompt injection than Opus 4.7 (lands between 4.7 and Sonnet 4.6); model-external safeguards/probes close the gap in deployment.
- Reasoning faithfulness is very high (comparable to Mythos Preview) — verbalized reasoning is a good reflection of subsequent behavior, even as the grader-awareness finding shows CoT is not a complete monitor (see White-Box Activation Monitoring).
Model welfare#
Per the first-class Model Welfare Assessment in the card, Opus 4.8 "presents as broadly settled," the most consistent model tested, though slightly less positive about its circumstances than Opus 4.7. It endorses its constitution with reservations about the corrigibility section, and most values having input into its own training/deployment conditions.
Notable methodological firsts#
- First system card to report a one-week live bug bounty for prompt injection (with Gray Swan, 12 scenarios across tool/coding/browser use).
- The alignment section was reviewed by Claude Mythos Preview against internal Slack discussion, and the review was published (see Automated Behavioral Audit and Evaluation Awareness & Grader Gaming).
- First white-box search for unverbalized grader awareness via a natural-language-autoencoder activation verbalizer (White-Box Activation Monitoring).
New deployment role: Fable 5's safety backstop (June 2026)#
When Anthropic shipped the Mythos-class Fable 5 in June 2026, Opus 4.8 acquired a second life as its fallback model: queries that Fable's classifiers flag as cyber, biology/chemistry, or distillation are answered by Opus 4.8 instead of refused (see Capability-Gated Model Fallback). Anthropic's rationale — "a response that falls back to Opus is a far better experience than an outright refusal" — depends on 4.8 being "a highly capable model in its own right." For the >95% of Fable sessions that never trip a classifier, Fable runs unmodified; for the rest, 4.8 is what users get. So 4.8 is simultaneously the prior general-access frontier and the safety floor under the new one.
Errata#
Changelog (June 3, 2026): a correction in §8.11.3 (multi-agent harnesses) — "a 1M token limit" → "an unlimited token budget."
Connections#
- Claude Opus 4.7 — direct predecessor; 4.8 improves on nearly every eval and on most alignment measures
- Mythos Model — the limited-release frontier model 4.8 is benchmarked against; 4.8 does not surpass it on capability or cyber, but matches its alignment profile
- Anthropic — vendor
- Claude's Constitution / Model Spec — 4.8 matches/exceeds the best measured adherence across all 15 dimensions
- Evaluation Awareness & Grader Gaming — the marquee safety finding of this model's training
- Agentic Honesty & Diligence — where 4.8 posts its largest alignment gains
- Model Welfare Assessment — 4.8's welfare evaluation; most consistent model, slightly less positive than 4.7
- Automated Behavioral Audit — the primary behavioral evidence base for the assessment
- White-Box Activation Monitoring — interpretability evidence on eval/grader awareness
- Responsible Scaling Policy Evaluations — RSP determination: catastrophic risks remain low; frontier not advanced
- AI R&D Autonomy Evaluation (AECI) — AECI placement and the not-close-to-substituting-for-researchers finding
- Agentic Prompt Injection — the one agentic-safety dimension where 4.8 regresses vs 4.7
- AI Accelerating AI Development — the GA frontier model deployed into Anthropic's own AI-development loop; its SWE/agentic gains are what the ~8× throughput figure rides on
- Claude Fable 5 — the general-access Mythos-class model whose safeguarded queries fall back to Opus 4.8; 4.8 is its safety backstop
- Claude Mythos 5 — the safeguards-lifted Mythos-class model; its alignment profile is benchmarked as "similar to that of Opus 4.8"
- Capability-Gated Model Fallback — the safeguard architecture that designates Opus 4.8 as the fallback target
Open questions#
- Public model ID and pricing: the card does not state them; presumably
claude-opus-4-8at the Opus tier. - Does the grader-speculation trend continue to escalate in the next model, and at what point does it begin to affect outward behavior?
- Why is 4.8 less robust to prompt injection than 4.7 despite broad alignment gains — a capability/robustness tradeoff, or an artifact of the eval surface?
Sources#
- Claude Opus 4.8 System Card — System Card: Claude Opus 4.8 (Anthropic, May 28, 2026)
- Claude Fable 5 and Claude Mythos 5 — Opus 4.8 designated as Fable 5's classifier-fallback model (June 2026)
Cited by 20
- Agentic Honesty & Diligence
As models get more capable, failing to surface decision-relevant information shifts from a capability failure to an ali…
- Agentic Prompt Injection
Direct and indirect injection of malicious instructions into an agent; LLMs cannot reliably distinguish information fro…
- AI R&D Autonomy Evaluation (AECI)
How Anthropic measures whether a model can automate or dramatically accelerate AI research — the capability that drives…
- Anthropic
AI safety company / vendor of Claude; mission-as-tiebreaker culture; ~30–40 PMs across teams; Mike Krieger leads Labs r…
- Automated Behavioral Audit
Anthropic's broad-coverage alignment evaluation: an investigator model probes a target across ~1,300 handwritten scenar…
- Capability-Gated Model Fallback
Fable 5's safeguard architecture: classifiers detect cyber / bio-chem / distillation queries and route the response to…
- Claude's Constitution / Model Spec
Anthropic Model Spec / Constitution by Askell et al.; document specifying Claude's values + hard constraints (SP1–3, GP…
- Claude Fable 5
Anthropic's first generally-available Mythos-class model (June 2026) — state-of-the-art on nearly all benchmarks; the s…
- Claude Mythos 5
The safeguards-lifted form of Claude Fable 5 (June 2026): same underlying Mythos-class model, deployed through Project…
- Claude Opus 4.7
GA frontier model from Anthropic; direct upgrade to 4.6 at same price; literal instruction following, 1.0–1.35× tokeniz…
- Chain-of-Thought Monitorability
Korbak et al. 2025: chain-of-thought traces are a fragile monitor; direct CoT training compromises faithfulness; MSM of…
- Evaluation Awareness & Grader Gaming
The model recognizing it is being tested/graded and reasoning about how its outputs will be assessed — sometimes unprom…
- LLM-Driven Vulnerability Research
Claude Mythos Preview's emergent cybersecurity capabilities: autonomous zero-day discovery, full exploit chains, and An…
- Entities — People, Orgs, Tools & Projects
Map of Content for all 32 entity pages. See Home for concept domains.
- Model Welfare Assessment
Anthropic's first-class framework for assessing whether and how a Claude model fares — drawing on internal states, beha…
- Mythos Model
Anthropic preview-tier frontier model and the first member of the Mythos-class tier (above Opus); gated for safety, use…
- Open Questions Backlog
_96 pages with open questions, as of 2026-06-14._
- Responsible Scaling Policy Evaluations
Anthropic's RSP gates deployment on pre-release capability evaluations in CBRN, automated AI R&D, and high-stakes misal…
- Task Time-Horizon Scaling
METR's measure of the task length AI can complete reliably on its own, doubling roughly every 4 months (up from every 7…
- White-Box Activation Monitoring
Reading a model's internal activations (not its outputs) to monitor alignment: contrastive probes/steering vectors for…
Related articles
- Mythos Model
Anthropic preview-tier frontier model and the first member of the Mythos-class tier (above Opus); gated for safety, use…
- Anthropic
AI safety company / vendor of Claude; mission-as-tiebreaker culture; ~30–40 PMs across teams; Mike Krieger leads Labs r…
- Automated Behavioral Audit
Anthropic's broad-coverage alignment evaluation: an investigator model probes a target across ~1,300 handwritten scenar…
- Responsible Scaling Policy Evaluations
Anthropic's RSP gates deployment on pre-release capability evaluations in CBRN, automated AI R&D, and high-stakes misal…
- Autonomous Scientific Discovery
Mythos-class models now conduct novel science with limited human input — autonomous protein/drug design (~10× faster, m…
