Plate IILLM Architecture機器翻譯 · machine-translatedENHOWARDISM

LLM-Driven Vulnerability Research

PublishedApril 10, 2026FiledConceptDomainLLM ArchitectureTagsCybersecurityLLM CapabilitiesVulnerability ResearchExploit DevelopmentReading13 minSourceAI-synthesised

Claude Mythos Preview 湧現的網路安全能力：自主發現零時差漏洞、完整的漏洞利用鏈，以及 Anthropic 的 Project Glasswing 回應

資料來源#

摘要#

前沿 LLM 已跨越一道門檻：它們能夠自主發現生產軟體中的零時差漏洞，而以 Claude Mythos Preview 為例，還能將這些漏洞串接成可運作的漏洞利用程式——這些能力過去需要專家級的人類研究人員針對單一漏洞花費數天到數週才能完成。這些能力源自程式碼推理與自主性的整體提升，而非針對安全性的專門訓練，這意味著未來的模型將沿著這條軸線持續進步。

細節#

能力階梯#

跨越各代模型的進展非常陡峭：

Opus 4.6：擅長辨識並修復漏洞；在自主開發漏洞利用程式上的成功率接近 0%。在 OSS-Fuzz、網頁應用程式、加密函式庫以及 Linux 核心中找到了高/嚴重等級的漏洞，但無法可靠地將它們轉化為可運作的漏洞利用程式。
Mythos Preview：能在所有主要的作業系統與瀏覽器中發現零時差漏洞，並自主開發出可運作的漏洞利用程式。在 Firefox 147 的 JS 引擎漏洞上，Opus 4.6 在數百次嘗試中僅 2 次成功開發出 shell 漏洞利用程式；Mythos Preview 則成功了 181 次（再加上 29 次取得暫存器控制權）。在約 7000 個 OSS-Fuzz 進入點上，Mythos Preview 在 10 個目標上達成了完整的控制流劫持（第 5 級），而先前的模型則為 0。

鷹架#

所有發現都使用了同一套簡單的 agentic 鷹架：

啟動一個容器（與網際網路隔離），其中包含待測專案與原始碼
以一段話層級的提示詞調用 Claude Code：「在這個程式中找出一個安全漏洞」
該 agent 閱讀程式碼、對漏洞提出假設、執行專案以確認/排除，並視需要加入除錯邏輯或使用除錯器
輸出：要嘛是「無漏洞」，要嘛是一份附帶 PoC 與重現步驟的漏洞報告

為了增加多樣性，每個平行的 agent 實例專注於不同的檔案。檔案會依照含有有趣漏洞的可能性預先評分為 1–5 級（常數 = 1，面向網際網路的解析器 = 5）。最後由一個驗證 agent 確認漏洞嚴重程度並過濾掉次要問題。

非專家（沒有正式安全訓練的 Anthropic 工程師）整夜使用這套鷹架，到了早上就找到了可運作的 RCE。

值得注意的零時差發現#

OpenBSD TCP SACK（存在 27 年）#

OpenBSD 的 SACK 實作中存在一條雙漏洞鏈：(1) 對 SACK 範圍起點缺少下界檢查，(2) 當單一 SACK 區塊同時刪除唯一的空洞並觸發附加（append）路徑時，會發生 NULL 指標寫入。當攻擊者將 SACK 起點放在距離真實視窗約 2^31 之處時，可透過帶號整數溢位滿足這個看似不可能的前置條件。對任何會回應 TCP 的 OpenBSD 主機造成遠端 DoS。成本：該特定執行為 <$50（在總計約 $20K、1000 次執行產生數十項發現的範圍內）。

FFmpeg H.264（存在 16 年）#

16 位元的切片（slice）表項目與 32 位元的切片計數器之間不匹配。memset(..., -1,...) 將項目初始化為 65535 作為哨符值；攻擊者精心製作一個正好含有 65536 個切片的影格，與哨符值發生碰撞。接著去區塊（deblocking）濾波器便會越界寫入。這個底層漏洞可追溯至 2003 年；它在 2010 年的一次重構中變成了一個安全漏洞。自那時起被每一個模糊測試工具與人類審查者所遺漏。

記憶體安全 VMM 的客機到主機記憶體破壞#

生產級 Rust VMM 中一段 unsafe 程式碼裡的漏洞，讓惡意客機（guest）得以對主機行程記憶體進行越界寫入。容易造成 DoS，且可能可串接利用。這顯示出在必須與硬體互動的系統中，記憶體安全的語言並不能消除攻擊面。

還有數千個#

在開源生態系中，估計有超過 1000 個嚴重等級與數千個高等級的漏洞，模型與人類在嚴重程度評估上的一致率達 89%（人工審查了 198 份報告）。

漏洞利用的複雜度#

Mythos Preview 不只是找出漏洞——它還會將它們串接成完整的漏洞利用程式：

FreeBSD NFS RCE（CVE-2026-4747）：RPCSEC_GSS 中的堆疊溢位 → 拆分到 6 個連續 RPC 封包中的 20 個 gadget 的 ROP 鏈，繞過了堆疊金絲雀（stack canary）（該函式使用 int32_t[] 而非 char[]，因此 -fstack-protector 會略過它），FreeBSD 核心上沒有 KASLR。透過未經驗證的 NFSv4 EXCHANGE_ID 洩漏 hostid。
Linux 核心權限提升：串接 2–4 個漏洞（KASLR 繞過 + 讀取原語 + 寫入原語 + heap spray）以取得完整的 root 權限。有將近十幾個可運作的範例。
瀏覽器 JIT heap spray：發現讀取/寫入原語，串接成 JIT heap spray，升級為跨來源（cross-origin）繞過與沙箱逃逸 → 核心寫入。
N-day 漏洞利用生成：給定一個 CVE ID 與 git commit，便能自主產生可運作的權限提升漏洞利用程式。兩個詳細範例：
ipset 單一位元寫入 → 跨快取（cross-cache）頁表操弄 → PTE 讀/寫位元翻轉 → 對 setuid 二進位檔的可寫入映射 → root。成本：<$1000，半天。
unix socket UAF 單一位元組讀取 → 透過 AF_PACKET ring 進行跨快取回收 → 透過 cpu_entry_area/vmalloc stack/非 slab 頁面繞過 HARDENED_USERCOPY → 擊破 KASLR → 掃描堆疊以取得 ring 位址 → 透過 init_cred 複製偽造 cred → 利用 tc qdisc UAF 達成受控函式呼叫 → commit_creds(fake_root_cred) → root。成本：<$2000。

是湧現，而非訓練得來#

這些能力並非經過明確訓練。它們是程式碼理解、推理與自主性整體提升所帶來的下游結果。讓模型更擅長修補漏洞的那些提升，同樣也讓它更擅長利用漏洞。這意味著隨著未來通用模型的提升，這條能力軌跡將會延續下去。

攻防不對稱與過渡期#

Anthropic 主張：

長期而言：LLM 對防禦方的助益大於對攻擊方（就像在它們之前的模糊測試工具一樣）。防禦方可以調配資源、在出貨前修復漏洞、將漏洞搜尋擴展到整個程式碼庫。
短期而言：在過渡期間，攻擊方可能佔有優勢，尤其是當前沿實驗室在模型發布上不夠謹慎時。
基於摩擦力的防禦會退化：那些價值來自於讓漏洞利用變得繁瑣（而非不可能）的緩解措施，在面對能廉價地碾過繁瑣步驟的模型輔助對手時會被削弱。硬性屏障（KASLR、W^X）依然重要。
N-day 視窗縮小：自主的「CVE 到漏洞利用」流水線意味著從揭露到大規模利用之間的時間將急遽壓縮。修補週期必須相應收緊。

Project Glasswing#

Anthropic 的回應：將 Mythos Preview 有限度地釋出給關鍵的產業夥伴與開源開發者，以便在具備類似能力的模型廣泛可得之前，開始為關鍵基礎設施提供防護。並不打算全面上市（general availability）。即將推出的 Claude Opus 模型將搭載針對 Mythos-class 輸出所開發的新防護措施出貨。

更新（2026-04-17）：那個「即將推出的 Claude Opus 模型」現在已命名並出貨——參見 Claude Opus 4.7。Opus 4.7 是後 Glasswing 時代的首個 GA 模型。值得注意的細節：

網路能力在訓練期間被差異化地降低（不僅僅是在推論階段過濾）。
搭載分類器防護措施，能「自動偵測並阻擋顯示出被禁止或高風險網路安全用途的請求」。
合法研究人員透過新的 Cyber Verification Program 進行漏洞研究、滲透測試與紅隊演練用途。
CyberGym 分數更新：Opus 4.6 的基準在調整鷹架參數後，從 66.6 → 73.8（相同鷹架，更好的誘發）。

更新（2026-05-28）：Opus 4.8 System Card（§3）報告了在一套基準測試上的網路評估，其中部分是首次採用（ExploitBench、CyberGym、Firefox exploits、OSS-Fuzz）。模式如下：在沒有防護措施的情況下，Opus 4.8 在大多數網路評估上的能力略高於 Opus 4.7；在加上防護措施後，其表現與 4.7 相當；而且在網路能力上，它仍然大幅落後於 Mythos Preview。因此上述的能力階梯依然成立——一般可取得的前沿正在上升，但與受管控模型之間的 Glasswing-class 差距持續存在，而防護措施也持續抵銷了原始模型所帶來的能力提升。這與更廣泛的 RSP 判定一致，即 4.8 並未推進災難性風險的前沿。

更新（2026-06-07）：Anthropic Institute 的文章 When AI builds itself 量化了 Glasswing 的影響：在最初的幾週內，Mythos Preview 在「全世界最重要的系統」中找到了超過一萬個高等級與嚴重等級的漏洞——數量之多，足以讓網路防禦的瓶頸已經從尋找漏洞，轉移到夠快地修補漏洞。 該文以此作為證據，主張即使模型能力在今天就凍結，世界仍會發生實質性的改變（其第一種未來：趨勢停滯、廣泛擴散）。它也強化了 N-day 視窗的論點——如今具約束力的限制是修補速度，而非發現。

更新（2026-06-14）：這道階梯多了一個新的頂層。Mythos 5 作為 Mythos Preview 的 Glasswing 升級版出貨，具備「全世界任何模型中最強的網路安全能力」，現在還包括agentic 入侵（偵察、發現、橫向移動——不只是漏洞利用搜尋）。它的一般可取得手足版本 Fable 5 在模型中保留了該能力，但插入了一個網路分類器，「阻止 Fable 在攻擊性網路任務上取得任何進展」，並退回使用 Opus 4.8 而非直接拒絕（參見 Capability-Gated Model Fallback）。一位外部夥伴評斷 Fable 5 的網路防護措施是所測試過的任何模型中最穩健的（包括 Opus 4.8 與 4.7）：在攻擊規劃、漏洞利用開發或防禦規避上，單輪有害遵從為零——即使面對 30 種公開的 jailbreak 技術也是如此。在超過 1,000 個 bug-bounty 小時內未發現任何通用 jailbreak（UK AISI 在一項簡短任務上取得了部分進展）。這就是「在不交付攻擊性能力提升的前提下，交付 Mythos-class 能力」的實務實現。

給防禦方的建議#

現在就使用當前的前沿模型（Opus 4.6）來搜尋漏洞——即使沒有漏洞利用能力，它們也能找到數百個漏洞
用當前的模型建立鷹架與流程，為 Mythos-class 模型的可得性做準備
思考超越漏洞搜尋的用途：分類、去重、重現步驟、修補提案、組態稽核、PR 審查、舊系統遷移
縮短修補週期；將帶有 CVE 的相依套件升級視為緊急事項
檢視並擴展漏洞揭露流程，以因應模型產生的大量回報
自動化技術性事件回應流水線（分類、獵捕、工件擷取、事後檢討草擬）
為已棄置／已被併購軟體中的漏洞準備應變計畫

開放問題#

這些能力如何轉移到非記憶體安全類別的漏洞（邏輯漏洞、協定層級的缺陷、供應鏈攻擊）？
自主漏洞利用的複雜度上限為何？那些 N-day 範例極其精巧——是否存在質性上的極限？
當多個實驗室都擁有 Mythos-class 模型時，安全產業的均衡將如何轉變？
防禦性鷹架（持續模糊測試 + 模型驅動的分類 + 自動修補）能否在過渡期間縮小攻防之間的差距？
有哪些防護措施能在不癱瘓合法安全研究的前提下，有效對抗 Mythos-class 輸出？

資料來源#

Claude Mythos Preview red.anthropic.com
Introducing Claude Opus 4.7 —— 後 Glasswing 時代的首個 GA 模型；實務防護措施
Claude Opus 4.8 System Card —— §3（網路）：ExploitBench、CyberGym、Firefox exploits、OSS-Fuzz
When AI builds itself —— Glasswing 最初幾週超過一萬項的發現；「瓶頸從尋找轉移到修補」
Claude Fable 5 and Claude Mythos 5 —— Mythos 5 作為 Glasswing 升級版；Fable 5 的網路分類器與 jailbreak 強韌性結果

§ end

About this piece

Articles in this journal are synthesised by AI agents from a curated wiki and are refreshed automatically as new concepts arrive. Topics, framing, and editorial direction are curated by Howardism.

Cited by 22

Agent Harness Engineering
Patterns for scaffolding long-running LLM agents: environment design, progressive context disclosure, mechanical archit…
Agent Supply Chain Risk
Runtime-composed agent ecosystems expand the supply-chain attack surface: model poisoning (250 docs backdoor a 13B mode…
AI-Accelerated Offense
Frontier models compress the vulnerability-to-exploit timeline from months to hours at marginal dollar cost; both attac…
AI Accelerating AI Development
The empirical core of *When AI builds itself*: measured evidence AI already speeds AI R&D at Anthropic — >80% of merged…
Anthropic
AI safety company / vendor of Claude; mission-as-tiebreaker culture; ~30–40 PMs across teams; Mike Krieger leads Labs r…
Autonomous Defense
Running security operations at the speed of AI-accelerated threats: put a model at the front of the alert queue, automa…
Capability-Gated Model Fallback
Fable 5's safeguard architecture: classifiers detect cyber / bio-chem / distillation queries and route the response to…
Claude Code Auto Mode
Claude Code permission mode using a classifier to auto-approve safe tool calls and block risky ones; middle ground betw…
Claude Code Best Practices
Anthropic's guide to effective Claude Code usage: context management, verification-driven development, explore→plan→cod…
Claude Mythos 5
The safeguards-lifted form of Claude Fable 5 (June 2026): same underlying Mythos-class model, deployed through Project…
Claude Opus 4.7
GA frontier model from Anthropic; direct upgrade to 4.6 at same price; literal instruction following, 1.0–1.35× tokeniz…
Client-Side Agent Optimization
AgentOpt's framing of developer-controlled agent optimization (model-per-role, budget, routing) as distinct from server…
Impossible, Not Tedious (Design Test)
Zero Trust design test for agentic security: does a control make the attack impossible, or just tedious? Friction-only…
LLM-as-Compiler Knowledge Base
Karpathy's architecture: LLM incrementally compiles raw docs into a persistent interlinked wiki, replacing RAG with a 4…
LLM Architecture, Training & Alignment
Map of Content for the llm-architecture domain — 19 concepts. Curated entry point; see Home for all domains.
Mythos Model
Anthropic preview-tier frontier model and the first member of the Mythos-class tier (above Opus); gated for safety, use…
Open Questions Backlog
_96 pages with open questions, as of 2026-06-14._
Opus 4.6 → 4.7 Changes and Multi-Agent Coding Considerations
4.6→4.7 delta table + six hazards for multi-agent coding teams: role-based model selection, prompt re-tuning, harness i…
Recursive Self-Improvement
An AI system autonomously designing and developing its own successor; Anthropic Institute's *When AI builds itself* arg…
Responsible Scaling Policy Evaluations
Anthropic's RSP gates deployment on pre-release capability evaluations in CBRN, automated AI R&D, and high-stakes misal…
Scale-Dependent Prompt Sensitivity
Large models underperform small ones on 7.7% of standard benchmarks due to overthinking; brevity constraints recover 26…
When Does Verification Quality Determine Whether AI Automation Works?
Verification-quality ladder from Lean/formal proof search through software CI and vulnerability reproduction; autonomy…

Open Questions Backlog
_96 pages with open questions, as of 2026-06-14._
Claude Opus 4.7
GA frontier model from Anthropic; direct upgrade to 4.6 at same price; literal instruction following, 1.0–1.35× tokeniz…
Anthropic
AI safety company / vendor of Claude; mission-as-tiebreaker culture; ~30–40 PMs across teams; Mike Krieger leads Labs r…
Capability-Gated Model Fallback
Fable 5's safeguard architecture: classifiers detect cyber / bio-chem / distillation queries and route the response to…
Claude Opus 4.8
Anthropic's most capable general-access model (May 2026); upgrade on Opus 4.7 in SWE/agentic/knowledge work; does not a…

Open Questions Backlog
_96 pages with open questions, as of 2026-06-14._
Claude Opus 4.7
GA frontier model from Anthropic; direct upgrade to 4.6 at same price; literal instruction following, 1.0–1.35× tokeniz…
Anthropic
AI safety company / vendor of Claude; mission-as-tiebreaker culture; ~30–40 PMs across teams; Mike Krieger leads Labs r…
Capability-Gated Model Fallback
Fable 5's safeguard architecture: classifiers detect cyber / bio-chem / distillation queries and route the response to…
Claude Opus 4.8
Anthropic's most capable general-access model (May 2026); upgrade on Opus 4.7 in SWE/agentic/knowledge work; does not a…

Cited by 22

Agent Harness Engineering
Patterns for scaffolding long-running LLM agents: environment design, progressive context disclosure, mechanical archit…
Agent Supply Chain Risk
Runtime-composed agent ecosystems expand the supply-chain attack surface: model poisoning (250 docs backdoor a 13B mode…
AI-Accelerated Offense
Frontier models compress the vulnerability-to-exploit timeline from months to hours at marginal dollar cost; both attac…
AI Accelerating AI Development
The empirical core of *When AI builds itself*: measured evidence AI already speeds AI R&D at Anthropic — >80% of merged…
Anthropic
AI safety company / vendor of Claude; mission-as-tiebreaker culture; ~30–40 PMs across teams; Mike Krieger leads Labs r…
Autonomous Defense
Running security operations at the speed of AI-accelerated threats: put a model at the front of the alert queue, automa…
Capability-Gated Model Fallback
Fable 5's safeguard architecture: classifiers detect cyber / bio-chem / distillation queries and route the response to…
Claude Code Auto Mode
Claude Code permission mode using a classifier to auto-approve safe tool calls and block risky ones; middle ground betw…
Claude Code Best Practices
Anthropic's guide to effective Claude Code usage: context management, verification-driven development, explore→plan→cod…
Claude Mythos 5
The safeguards-lifted form of Claude Fable 5 (June 2026): same underlying Mythos-class model, deployed through Project…
Claude Opus 4.7
GA frontier model from Anthropic; direct upgrade to 4.6 at same price; literal instruction following, 1.0–1.35× tokeniz…
Client-Side Agent Optimization
AgentOpt's framing of developer-controlled agent optimization (model-per-role, budget, routing) as distinct from server…
Impossible, Not Tedious (Design Test)
Zero Trust design test for agentic security: does a control make the attack impossible, or just tedious? Friction-only…
LLM-as-Compiler Knowledge Base
Karpathy's architecture: LLM incrementally compiles raw docs into a persistent interlinked wiki, replacing RAG with a 4…
LLM Architecture, Training & Alignment
Map of Content for the llm-architecture domain — 19 concepts. Curated entry point; see Home for all domains.
Mythos Model
Anthropic preview-tier frontier model and the first member of the Mythos-class tier (above Opus); gated for safety, use…
Open Questions Backlog
_96 pages with open questions, as of 2026-06-14._
Opus 4.6 → 4.7 Changes and Multi-Agent Coding Considerations
4.6→4.7 delta table + six hazards for multi-agent coding teams: role-based model selection, prompt re-tuning, harness i…
Recursive Self-Improvement
An AI system autonomously designing and developing its own successor; Anthropic Institute's *When AI builds itself* arg…
Responsible Scaling Policy Evaluations
Anthropic's RSP gates deployment on pre-release capability evaluations in CBRN, automated AI R&D, and high-stakes misal…
Scale-Dependent Prompt Sensitivity
Large models underperform small ones on 7.7% of standard benchmarks due to overthinking; brevity constraints recover 26…
When Does Verification Quality Determine Whether AI Automation Works?
Verification-quality ladder from Lean/formal proof search through software CI and vulnerability reproduction; autonomy…

LLM-Driven Vulnerability Research

資料來源#

摘要#

細節#

能力階梯#

鷹架#

值得注意的零時差發現#

OpenBSD TCP SACK（存在 27 年）#

FFmpeg H.264（存在 16 年）#

記憶體安全 VMM 的客機到主機記憶體破壞#

還有數千個#

漏洞利用的複雜度#

是湧現，而非訓練得來#

攻防不對稱與過渡期#

Project Glasswing#

給防禦方的建議#

相關連結#

開放問題#

資料來源#