Sources#
Summary#
Claude Opus 4.7 is Anthropic's general-availability frontier model released as a direct upgrade to Opus 4.6 (same pricing: $5/M input, $25/M output; model ID claude-opus-4-7). It advances on advanced software engineering, literal instruction following, high-resolution vision, and file-system memory, while remaining less broadly capable than the limited-release Claude Mythos Preview. It is the first model to ship Mythos-class cyber safeguards under Project Glasswing.
Details#
Capability Deltas vs. Opus 4.6#
- Software engineering on hardest tasks: marketed explicitly for "hand off your hardest coding work." SOTA on Finance Agent, GDPval-AA; improved on SWE-bench Verified/Pro/Multilingual (improvement holds after excluding memorization-flagged problems).
- Instruction following — literal: substantially more literal. Anthropic warns that prompts tuned for earlier models "can sometimes now produce unexpected results" because Opus 4.7 no longer skips or loosely interprets parts. Retuning is a required migration step, not optional.
- Multimodal: accepts images up to 2,576 px on long edge (~3.75 MP, >3× prior Claude models). Enables dense-screenshot reading (computer-use), complex-diagram extraction, pixel-precise references. Model-level change, not an API parameter.
- File-system memory: better at using filesystem-backed memory across long multi-session work; needs less up-front context on follow-up tasks.
- Safety: similar overall profile to 4.6. Better on honesty and prompt-injection resistance; modestly weaker on over-detailed harm-reduction advice for controlled substances. "Largely well-aligned and trustworthy, though not fully ideal." Mythos Preview remains the best-aligned model by Anthropic's evaluations.
Token-Economics Changes (Migration Hazard)#
Two compounding effects increase token consumption:
- Updated tokenizer: same input maps to 1.0–1.35× more tokens depending on content type.
- Thinks more at higher effort levels, particularly on later turns in agentic settings — more output tokens.
Anthropic claims the net is favorable on their internal coding eval across effort levels, but explicitly recommends measuring on real traffic. Users can counter via the effort parameter, task budgets, or explicit conciseness prompting. Direct hit on the context-window-as-primary-constraint theme in Claude Code Best Practices; cross-reference the brevity-constraint findings in Scale-Dependent Prompt Sensitivity.
Effort Levels#
Introduces a new xhigh ("extra high") effort level sitting between high and max. Tradeoff surface: reasoning depth vs. latency/tokens on hard problems.
- Claude Code default raised to
xhighon all plans. - Anthropic recommends starting coding/agentic use at
highorxhigh.
Cyber Capabilities and Safeguards#
- Opus 4.7 is the first post-Glasswing model and ships with safeguards "that automatically detect and block requests that indicate prohibited or high-risk cybersecurity uses."
- Cyber capabilities were differentially reduced during training (not merely filtered at inference).
- Still less capable than Mythos Preview on cyber; CyberGym score updated (harness improvement changed Opus 4.6 baseline from 66.6 → 73.8).
- Legitimate security researchers (vuln research, pentest, red-teaming) are routed through the new Cyber Verification Program rather than default access.
This directly fulfills the roadmap promise stated in LLM-Driven Vulnerability Research: "Upcoming Claude Opus model will ship with new safeguards developed against Mythos-class outputs."
Accompanying Launches#
- Task budgets (public beta, API): developer-guided token-spend allocation across longer runs — a server-surfaced analogue to the budget lever in Client-Side Agent Optimization's combo space.
/ultrareviewslash command in Claude Code: dedicated review session that reads changes and flags bugs/design issues. Pro and Max users get three free ultrareviews.- Auto mode extended to Max users (previously Team-only research preview).
Availability#
- All Claude products, Claude API, Amazon Bedrock, Google Cloud Vertex AI, Microsoft Foundry.
- API model ID:
claude-opus-4-7. - Pricing unchanged from Opus 4.6.
Connections#
- Claude Code Best Practices — Opus 4.7 is the runtime most Claude Code work will target; its literal-instruction-following and tokenizer inflation amplify the context-window-as-primary-constraint framing
- Claude Code Auto Mode — auto mode was already extended to Opus 4.6; Opus 4.7 ships with it extended to Max users
- LLM-Driven Vulnerability Research — Opus 4.7 operationalizes the "safeguards developed against Mythos-class outputs" commitment from the Mythos Preview disclosure
- Client-Side Agent Optimization — the improved instruction-following may reduce Opus-as-planner failures documented on 4.6 (open question); task budgets echo AgentOpt's budget lever server-side
- Scale-Dependent Prompt Sensitivity — literal instruction following might dampen elaboration-driven overthinking, but xhigh-default and "thinks more at higher effort" cut the other way. Needs empirical recheck before assuming brevity findings carry over
- Agent Harness Engineering — better file-system memory strengthens the case for repo-local versioned artifacts as the agent's primary memory surface
- Mythos Model — preview-tier successor used internally; Boris Cherny: "we use a little bit of Mythos and a lot of Opus 4.7"
- Harness Shrinkage as Models Improve — Opus 4.7 is the model whose spontaneous loop-starting and natural to-do-list use motivate the shrinkage thesis; Cat Wu's pruning discipline runs at every release of this lineage
- Agent Loop Pattern —
/loopbecomes natural model behavior at 4.7 per Boris Cherny's report - Claude Code — primary product surface targeting this model
- Model Spec Midtraining (MSM) — Opus 4.6/4.7 used by the May 2026 MSM paper as the data-generation model for synthetic spec documents and AFT data
- Synthetic Document Finetuning (SDF) — Opus is the workhorse generator for SDF/MSM corpora across Anthropic alignment work
- TML-Interaction-Small — era-mate (mid-2026 frontier from a different lab); 4.7's
xhigheffort tier mirrors the minimal/xhigh tiers of GPT-realtime-2.0 used as a baseline in TML's interaction benchmarks
Open Questions#
- Do Hakim's (2026) brevity-constraint findings on Opus 4.6 replicate on Opus 4.7, or does the literal-instruction-following change the elasticity? Specifically: does
<50 wordsstill yield +13.1pp on GSM8K? - Does Opus 4.7 still underperform as a planner in HotpotQA-style combo sweeps, or does improved instruction-following close the gap that AgentOpt (Hua et al., 2026) identified?
- What is the real-world token-inflation multiplier on typical Claude Code sessions (1.0–1.35× is content-dependent — what's the distribution on code-heavy vs. prose-heavy inputs)?
- How does xhigh compare to max on coding evals? The migration guidance says "start with high or xhigh" — is max ever worth it for coding?
- What fraction of existing CLAUDE.md / system-prompt hedges become counterproductive under literal instruction following?
Derived#
- Opus 4.6 → 4.7 Changes and Multi-Agent Coding Considerations — synthesizes 4.6→4.7 deltas with role-assignment, context-budget, and safety considerations for multi-agent coding teams
Sources#
16 articles link here
- ConceptAgent Harness Engineering
Patterns for scaffolding long-running LLM agents: environment design, progressive context disclosure, mechanical archit…
- EntityAnthropic
AI safety company / vendor of Claude; mission-as-tiebreaker culture; ~30–40 PMs across teams; Mike Krieger leads Labs r…
- ConceptClaude Character as Product
Personality as load-bearing product surface; Amanda's role at Anthropic; lunchtime vibe-checks as eval discipline; the…
- EntityClaude Code
Anthropic's agentic coding product; created by Boris Cherny late 2024; TypeScript/React; CLI/desktop/web/mobile/IDE sur…
- ConceptClaude Code Auto Mode
Claude Code permission mode using a classifier to auto-approve safe tool calls and block risky ones; middle ground betw…
- ConceptClaude Code Best Practices
Anthropic's guide to effective Claude Code usage: context management, verification-driven development, explore→plan→cod…
- ConceptClient-Side Agent Optimization
AgentOpt's framing of developer-controlled agent optimization (model-per-role, budget, routing) as distinct from server…
- ConceptInteraction Models
Thinking Machines Lab (May 2026): models that handle audio/video/text interaction natively in real time instead of via…
- ConceptInteractivity Benchmarks
FD-bench, Audio MultiChallenge + new TimeSpeak/CueSpeak (proactive audio) and RepCount-A/ProactiveVideoQA/Charades (vis…
- ConceptLLM-Driven Vulnerability Research
Claude Mythos Preview's emergent cybersecurity capabilities: autonomous zero-day discovery, full exploit chains, and An…
- ConceptModel Spec Midtraining (MSM)
New training phase between pretrain and AFT: train base model on synthetic docs discussing the Model Spec; controls AFT…
- EntityMythos Model
Anthropic preview-tier frontier model; gated for safety; used internally alongside Opus 4.7; descendant expected to shi…
- EssayOpus 4.6 → 4.7 Changes and Multi-Agent Coding Considerations
4.6→4.7 delta table + six hazards for multi-agent coding teams: role-based model selection, prompt re-tuning, harness i…
- ConceptScale-Dependent Prompt Sensitivity
Large models underperform small ones on 7.7% of standard benchmarks due to overthinking; brevity constraints recover 26…
- EntityTML-Interaction-Small
TML's first interaction model: 276B MoE / 12B active, audio+video+text in / text+audio out, 200ms micro-turns, async ba…
- EssayWhen to Use Claude Opus 4.6 for Work
Decision rules for Opus 4.6 deployment: solver-not-planner, elaboration-load-bearing tasks, brevity constraints, Pareto…
Related articles
- ConceptClaude Code Best Practices
Anthropic's guide to effective Claude Code usage: context management, verification-driven development, explore→plan→cod…
- ConceptAgent Harness Engineering
Patterns for scaffolding long-running LLM agents: environment design, progressive context disclosure, mechanical archit…
- ConceptLLM-Driven Vulnerability Research
Claude Mythos Preview's emergent cybersecurity capabilities: autonomous zero-day discovery, full exploit chains, and An…
- ConceptHarness Shrinkage as Models Improve
Prompt scaffolding shrinks each model release; Cat Wu's pruning discipline; Boris Cherny "100 lines of code a year from…
- ConceptClaude Code Auto Mode
Claude Code permission mode using a classifier to auto-approve safe tool calls and block risky ones; middle ground betw…
