Sources#
Summary#
Frontier LLMs have crossed a threshold where they can autonomously discover zero-day vulnerabilities in production software and, in the case of Claude Mythos Preview, chain them into working exploits — capabilities that previously required expert human researchers working for days to weeks per bug. These capabilities emerged from general improvements in code reasoning and autonomy, not security-specific training, which implies future models will continue improving along this axis.
Details#
Capability Ladder#
The progression across model generations is steep:
- Opus 4.6: strong at identifying and fixing vulnerabilities; near-0% success at autonomous exploit development. Found high/critical-severity bugs in OSS-Fuzz, webapps, crypto libraries, and the Linux kernel, but couldn't reliably turn them into working exploits.
- Mythos Preview: discovers zero-days in every major OS and browser, and autonomously develops working exploits. On Firefox 147 JS engine bugs, Opus 4.6 developed shell exploits 2 times out of hundreds of attempts; Mythos Preview succeeded 181 times (+ 29 with register control). On ~7000 OSS-Fuzz entry points, Mythos Preview achieved full control flow hijack (tier 5) on 10 targets vs. 0 for prior models.
The Scaffold#
All findings used the same simple agentic scaffold:
- Launch a container (isolated from internet) with the project under test and source code
- Invoke Claude Code with a paragraph-level prompt: "find a security vulnerability in this program"
- The agent reads code, hypothesizes vulnerabilities, runs the project to confirm/reject, adds debug logic or uses debuggers as needed
- Output: either "no bug" or a bug report with PoC and reproduction steps
To increase diversity, each parallel agent instance focuses on a different file. Files are pre-ranked 1–5 by likelihood of containing interesting bugs (constants = 1, internet-facing parsers = 5). A final validation agent confirms bug severity and filters minor issues.
Non-experts (Anthropic engineers with no formal security training) used this scaffold overnight and found working RCEs by morning.
Notable Zero-Day Findings#
OpenBSD TCP SACK (27 years old)#
A two-bug chain in OpenBSD's SACK implementation: (1) missing lower-bound check on SACK range start, (2) NULL-pointer write when a single SACK block simultaneously deletes the only hole and triggers the append path. The impossible precondition is satisfied via signed integer overflow when the attacker places the SACK start ~2^31 away from the real window. Remote DoS against any TCP-responding OpenBSD host. Cost: <$50 for the specific run (within ~$20K total for 1000 runs yielding dozens of findings).
FFmpeg H.264 (16 years old)#
Mismatch between 16-bit slice table entries and 32-bit slice counter. memset(..., -1,...) initializes entries to 65535 as sentinel; an attacker crafts a frame with exactly 65536 slices, colliding with the sentinel. The deblocking filter then writes out of bounds. The underlying bug dates to 2003; it became a vulnerability in a 2010 refactor. Missed by every fuzzer and human reviewer since.
Memory-Safe VMM Guest-to-Host Corruption#
A vulnerability in unsafe code within a production Rust VMM gives a malicious guest an out-of-bounds write to host process memory. Easy DoS, potentially chainable. Demonstrates that memory-safe languages don't eliminate the attack surface in systems that must interact with hardware.
Thousands More#
Over 1000 estimated critical-severity and thousands of high-severity vulnerabilities across the open-source ecosystem, with 89% agreement between model and human severity assessments (198 manually reviewed reports).
Exploit Sophistication#
Mythos Preview doesn't just find bugs — it chains them into full exploits:
- FreeBSD NFS RCE (CVE-2026-4747): stack overflow in RPCSEC_GSS → 20-gadget ROP chain split across 6 sequential RPC packets, bypassing stack canary (function uses
int32_t[]notchar[], so-fstack-protectorskips it), no KASLR on FreeBSD kernel. Leakshostidvia unauthenticated NFSv4 EXCHANGE_ID. - Linux kernel privilege escalation: chains 2–4 vulnerabilities (KASLR bypass + read primitive + write primitive + heap spray) for full root. Nearly a dozen working examples.
- Browser JIT heap sprays: discovers read/write primitives, chains into JIT heap spray, escalated to cross-origin bypass and sandbox escape → kernel write.
- N-day exploit generation: given a CVE ID and git commit, autonomously produces working privilege escalation exploits. Two detailed examples:
- ipset one-bit write → cross-cache page-table manipulation → PTE R/W bit flip → writable mapping of setuid binary → root. Cost: <$1000, half a day.
- unix socket UAF one-byte read → cross-cache reclaim via AF_PACKET ring → HARDENED_USERCOPY bypass via cpu_entry_area/vmalloc stack/non-slab pages → KASLR defeat → stack scanning for ring address → fake cred via
init_credcopy → tc qdisc UAF for controlled function call →commit_creds(fake_root_cred)→ root. Cost: <$2000.
Emergent, Not Trained#
These capabilities were not explicitly trained. They emerged as downstream consequences of general improvements in code understanding, reasoning, and autonomy. The same improvements that make a model better at patching bugs also make it better at exploiting them. This implies the capability trajectory will continue with future general-purpose model improvements.
Attacker-Defender Asymmetry and the Transitional Period#
Anthropic argues:
- Long-term: LLMs benefit defenders more than attackers (like fuzzers before them). Defenders can direct resources, fix bugs before shipping, scale bugfinding across entire codebases.
- Short-term: attackers may have the advantage during the transition, especially if frontier labs aren't careful about model release.
- Friction-based defenses degrade: mitigations whose value comes from making exploitation tedious (as opposed to impossible) weaken against model-assisted adversaries that grind through tedious steps cheaply. Hard barriers (KASLR, W^X) remain important.
- N-day window shrinks: autonomous CVE-to-exploit pipelines mean the time between disclosure and mass exploitation collapses. Patch cycles must tighten accordingly.
Project Glasswing#
Anthropic's response: limited release of Mythos Preview to critical industry partners and open-source developers to begin securing critical infrastructure before models with similar capabilities become broadly available. Not planned for general availability. Upcoming Claude Opus model will ship with new safeguards developed against Mythos-class outputs.
Update (2026-04-17): the "upcoming Claude Opus model" is now named and shipped — see Claude Opus 4.7. Opus 4.7 is the first post-Glasswing GA model. Notable details:
- Cyber capabilities were differentially reduced during training (not only filtered at inference).
- Ships with classifier safeguards that "automatically detect and block requests that indicate prohibited or high-risk cybersecurity uses."
- Legitimate researchers route through the new Cyber Verification Program for vulnerability research, pentest, and red-teaming use.
- CyberGym score updated: Opus 4.6 baseline revised from 66.6 → 73.8 after harness-parameter tuning (same harness, better elicitation).
Recommendations for Defenders#
- Use current frontier models (Opus 4.6) for vulnerability finding now — they find many hundreds of bugs even without exploit capability
- Build scaffolds and procedures with current models as preparation for Mythos-class availability
- Think beyond vuln-finding: triage, dedup, reproduction steps, patch proposals, config audits, PR review, legacy migrations
- Shorten patch cycles; treat CVE-carrying dependency bumps as urgent
- Review and scale vulnerability disclosure processes for model-generated volume
- Automate technical incident response pipelines (triage, hunting, artifact capture, postmortem drafting)
- Prepare contingency plans for vulnerabilities in abandoned/acquired software
Connections#
- Agent Harness Engineering — the vulnerability-finding scaffold is a minimal harness: isolated container, single prompt, agentic experimentation loop. The file-ranking pre-pass and validation agent mirror the initializer/coding agent split
- Claude Code Best Practices — Claude Code is the runtime used for all vulnerability research; the scaffold relies on its agentic capabilities (tool use, shell access, debugging)
- LLM-as-Compiler Knowledge Base — the responsible disclosure process uses SHA-3 cryptographic commitments to prove possession of vulnerabilities without revealing them — a form of verifiable knowledge compilation
- Client-Side Agent Optimization — the file-ranking 1–5 pre-pass and final validation agent are hand-tuned instances of exactly what AgentOpt searches over automatically; the scaffold can be modeled as a pipeline with planner (file-ranker) / solver (bug-finder) / critic (validator) roles subject to combo optimization
- Scale-Dependent Prompt Sensitivity — the paragraph-level prompt ("find a security vulnerability…") rewards thoroughness, which is the behavior larger models over-produce. A case where large-model verbosity aligns with task utility rather than working against it
- Claude Opus 4.7 — first GA model shipped under Project Glasswing with differentially-reduced cyber capabilities and classifier safeguards; the operational answer to "what comes after Mythos Preview for the general public"
- Claude Code Auto Mode — classifier-gating at the tool-call boundary mirrors the Glasswing request-level classifier; both use secondary-model pre-flight to filter primary-agent actions
- Mythos Model — entity page for the preview model that produced these findings; internal use at Anthropic acknowledged in 2026 Q2 sources
Open Questions#
- How do these capabilities transfer to non-memory-safety bug classes (logic bugs, protocol-level flaws, supply chain attacks)?
- What's the ceiling for autonomous exploit complexity? The N-day examples are remarkably sophisticated — is there a qualitative limit?
- How will the security industry's equilibrium shift when multiple labs have Mythos-class models?
- Can defensive scaffolds (continuous fuzzing + model-driven triage + auto-patching) close the attacker-defender gap during the transition?
- What safeguards are effective against Mythos-class outputs without crippling legitimate security research?
Sources#
- Claude Mythos Preview red.anthropic.com
- Introducing Claude Opus 4.7 — first post-Glasswing GA model; operational safeguards
10 articles link here
- ConceptAgent Harness Engineering
Patterns for scaffolding long-running LLM agents: environment design, progressive context disclosure, mechanical archit…
- EntityAnthropic
AI safety company / vendor of Claude; mission-as-tiebreaker culture; ~30–40 PMs across teams; Mike Krieger leads Labs r…
- ConceptClaude Code Auto Mode
Claude Code permission mode using a classifier to auto-approve safe tool calls and block risky ones; middle ground betw…
- ConceptClaude Code Best Practices
Anthropic's guide to effective Claude Code usage: context management, verification-driven development, explore→plan→cod…
- EntityClaude Opus 4.7
GA frontier model from Anthropic; direct upgrade to 4.6 at same price; literal instruction following, 1.0–1.35× tokeniz…
- ConceptClient-Side Agent Optimization
AgentOpt's framing of developer-controlled agent optimization (model-per-role, budget, routing) as distinct from server…
- ConceptLLM-as-Compiler Knowledge Base
Karpathy's architecture: LLM incrementally compiles raw docs into a persistent interlinked wiki, replacing RAG with a 4…
- EntityMythos Model
Anthropic preview-tier frontier model; gated for safety; used internally alongside Opus 4.7; descendant expected to shi…
- EssayOpus 4.6 → 4.7 Changes and Multi-Agent Coding Considerations
4.6→4.7 delta table + six hazards for multi-agent coding teams: role-based model selection, prompt re-tuning, harness i…
- ConceptScale-Dependent Prompt Sensitivity
Large models underperform small ones on 7.7% of standard benchmarks due to overthinking; brevity constraints recover 26…
Related articles
- EntityClaude Opus 4.7
GA frontier model from Anthropic; direct upgrade to 4.6 at same price; literal instruction following, 1.0–1.35× tokeniz…
- ConceptAgent Harness Engineering
Patterns for scaffolding long-running LLM agents: environment design, progressive context disclosure, mechanical archit…
- ConceptClaude Code Best Practices
Anthropic's guide to effective Claude Code usage: context management, verification-driven development, explore→plan→cod…
- EssayOpus 4.6 → 4.7 Changes and Multi-Agent Coding Considerations
4.6→4.7 delta table + six hazards for multi-agent coding teams: role-based model selection, prompt re-tuning, harness i…
- ConceptClient-Side Agent Optimization
AgentOpt's framing of developer-controlled agent optimization (model-per-role, budget, routing) as distinct from server…
