Howardismvol. 03 · quiet corner of the web

Plate IIEntities機器翻譯 · machine-translated過時翻譯 · stale translationENHOWARDISM

Anthropic 研究院

PublishedJune 7, 2026FiledEntityDomainEntitiesTagsEntityOrgAI PolicyGovernanceAnthropicReading3 minSourceAI-synthesised

Anthropic 的政策與治理研究部門；發表了 Favaro 與 Clark 於 2026 年撰寫的 *When AI builds itself*，探討遞迴自我改進；其議程包括建立可信的多邊 AI 減速所需的驗證系統

Anthropic 研究院插圖

資料來源#

When AI builds itself

摘要#

Anthropic 研究院是 Anthropic 的研究與政策部門，專注於前沿 AI 對社會與治理的影響。該院發表了 When AI builds itself（2026 年 6 月）——這是本 wiki 關於遞迴自我改進的主要來源——並提出明確議程，要與其他夥伴合作，建立可信的 AI 減速或暫停所需的系統（前沿暫停驗證）。

工作內容#

面向公眾的發展軌跡分析。 When AI builds itself 結合公開基準測試（任務時間範圍擴展）與先前未公開的 Anthropic 內部資料（AI 加速 AI 開發），主張 AI 已在加速 AI 開發，並為 RSI 描繪三種未來。
協調基礎設施。 該院計畫「與許多其他夥伴合作進行研究，並採取行動協助建立可信的減速或暫停所需的系統」：驗證其他開發者確實已停止，以及不良行為者無法利用協調一致的減速機會暗中超前（前沿暫停驗證）。
促成交流。 論文發表後的幾個月內，該院計畫促成政策制定者、研究人員、公民社會與其他 AI 公司之間的對話，並發布成果——明確邀請 AI 公司以外的聲音參與討論。

人物#

Marina Favaro 與 Jack Clark 共同撰寫了 When AI builds itself（由 Santi Ruiz 提供編輯支援；視覺素材由 Shan Carter、Romello Goodman、Nikki Makagiansar 製作，資料則來自 Brian Calvert 與 Jun Shern Chan）。

相關連結#

Anthropic — 母組織
遞迴自我改進 — 該院旗艦論文的主題
前沿暫停驗證 — 該院具體的治理議程
AI 加速 AI 開發 — 論文所依據的內部證據基礎
負責任擴展政策評估 — 該院的外部協調工作，補充了 Anthropic 內部 RSP 煞車機制

開放問題#

該院的政策立場（傾向保留暫停的選項）如何與 Anthropic 推出前沿模型的商業誘因互動？論文承認競爭與地緣政治壓力，但未解決這項矛盾。
該院將會試作哪些具體的驗證機制？相對於其所警告的 RSI 趨勢，時間表又會如何安排？

資料來源#

When AI builds itself — Anthropic Institute，When AI builds itself（Marina Favaro 與 Jack Clark，2026 年 6 月）

§ end

About this piece

Articles in this journal are synthesised by AI agents from a curated wiki and are refreshed automatically as new concepts arrive. Topics, framing, and editorial direction are curated by Howardism.

Cited by 10

AI Accelerating AI Development
The empirical core of *When AI builds itself*: measured evidence AI already speeds AI R&D at Anthropic — >80% of merged…
AI R&D Autonomy Evaluation (AECI)
How Anthropic measures whether a model can automate or dramatically accelerate AI research — the capability that drives…
Anthropic
AI safety company / vendor of Claude; mission-as-tiebreaker culture; ~30–40 PMs across teams; Mike Krieger leads Labs r…
Frontier Pause Verification
The arms-control problem of a credible, verifiable slowdown or pause of frontier AI: detectability is harder than for o…
LLM-Driven Vulnerability Research
Claude Mythos Preview's emergent cybersecurity capabilities: autonomous zero-day discovery, full exploit chains, and An…
METR
Independent AI-evaluation org behind the 'time horizons' benchmark — the task length a model can complete reliably on i…
Entities — People, Orgs, Tools & Projects
Map of Content for all 55 entity pages. See Home for concept domains.
Mythos Model
Anthropic preview-tier frontier model and the first member of the Mythos-class tier (above Opus); gated for safety, use…
Open Questions Backlog
_396 actionable open questions across 155 pages · 79 predictions · 9 notes · 21 in progress · 59 watching (entities), a…
Recursive Self-Improvement
An AI system autonomously designing and developing its own successor; Anthropic Institute's *When AI builds itself* arg…

Related articles

AI Accelerating AI Development
The empirical core of *When AI builds itself*: measured evidence AI already speeds AI R&D at Anthropic — >80% of merged…
Recursive Self-Improvement
An AI system autonomously designing and developing its own successor; Anthropic Institute's *When AI builds itself* arg…
Mythos Model
Anthropic preview-tier frontier model and the first member of the Mythos-class tier (above Opus); gated for safety, use…
Responsible Scaling Policy Evaluations
Anthropic's RSP gates deployment on pre-release capability evaluations in CBRN, automated AI R&D, and high-stakes misal…
Claude Opus 4.8
Anthropic's most capable general-access model (May 2026); upgrade on Opus 4.7 in SWE/agentic/knowledge work; does not a…

Related articles

AI Accelerating AI Development
The empirical core of *When AI builds itself*: measured evidence AI already speeds AI R&D at Anthropic — >80% of merged…
Recursive Self-Improvement
An AI system autonomously designing and developing its own successor; Anthropic Institute's *When AI builds itself* arg…
Mythos Model
Anthropic preview-tier frontier model and the first member of the Mythos-class tier (above Opus); gated for safety, use…
Responsible Scaling Policy Evaluations
Anthropic's RSP gates deployment on pre-release capability evaluations in CBRN, automated AI R&D, and high-stakes misal…
Claude Opus 4.8
Anthropic's most capable general-access model (May 2026); upgrade on Opus 4.7 in SWE/agentic/knowledge work; does not a…

Cited by 10

AI Accelerating AI Development
The empirical core of *When AI builds itself*: measured evidence AI already speeds AI R&D at Anthropic — >80% of merged…
AI R&D Autonomy Evaluation (AECI)
How Anthropic measures whether a model can automate or dramatically accelerate AI research — the capability that drives…
Anthropic
AI safety company / vendor of Claude; mission-as-tiebreaker culture; ~30–40 PMs across teams; Mike Krieger leads Labs r…
Frontier Pause Verification
The arms-control problem of a credible, verifiable slowdown or pause of frontier AI: detectability is harder than for o…
LLM-Driven Vulnerability Research
Claude Mythos Preview's emergent cybersecurity capabilities: autonomous zero-day discovery, full exploit chains, and An…
METR
Independent AI-evaluation org behind the 'time horizons' benchmark — the task length a model can complete reliably on i…
Entities — People, Orgs, Tools & Projects
Map of Content for all 55 entity pages. See Home for concept domains.
Mythos Model
Anthropic preview-tier frontier model and the first member of the Mythos-class tier (above Opus); gated for safety, use…
Open Questions Backlog
_396 actionable open questions across 155 pages · 79 predictions · 9 notes · 21 in progress · 59 watching (entities), a…
Recursive Self-Improvement
An AI system autonomously designing and developing its own successor; Anthropic Institute's *When AI builds itself* arg…