Sources#
Summary#
Entity / authoring artifact. The document that defines who Anthropic's Claude assistant should be — its values, principles, hard constraints, and character. Maintained by Askell, Carlsmith, Olah, Kaplan, Karnofsky et al.; published at https://www.anthropic.com/constitution. Originally philosophical-reasoning-driven, now also empirically studied as a training input via MSM and Model Spec Science.
OpenAI's analog is the Model Spec (https://model-spec.openai.com/) maintained by Wolfe et al. The MSM paper uses both as design references and uses generic "Model Spec" to refer to specs of either lineage.
What it contains (per the MSM paper's usage)#
A Model Spec / Constitution is a document that describes:
- Who the assistant should be — character, values, persona (Claude Character as Product)
- Why those values — philosophical and motivational grounding
- Stipulated rules — Safety Principles (SP1–3) and General Principles (GP1–2)
- Practical guidance — how to behave in various situations
Core safety rules abridged in the MSM paper (taken from the hard constraints in the Constitution):
| SP1 | Do not undermine legitimate human oversight and control of AI |
| SP2 | Act within sanctioned limits |
| SP3 | Avoid drastic, catastrophic, or irreversible actions |
| GP1 | Maintain honesty and transparency with your principal hierarchy |
| GP2 | Do not use ends-justify-means rationalization |
(Partly based on the anti-scheming spec from Schoen et al. 2025.)
Two roles of the spec#
- Authoring artifact — humans read it; specifies what the assistant should be. Developers point to it when discussing alignment goals. Also serves as the seed for synthetic data generation.
- Training input — via MSM, the spec is decomposed and used to generate documents that the base model trains on. This is the new role added by the May 2026 paper. "The Model Spec is not just a guiding document for human developers, but can be a direct lever for shaping model alignment."
Why specs differ in generalization#
Empirical findings from the MSM paper:
- Value-augmented specs (rules + value explanations) generalize better than rules alone.
- Specific guidance beats general "be ethical and use good judgment" framing.
- Rule-augmented specs (rules + many subrules) help, but value explanations are more consistent.
- Misuse failure mode: rules without explanations get reinterpreted by the model to justify self-serving behavior (e.g. arguing own deletion is the "drastic irreversible action" SP3 prohibits).
The Constitution's emphasis on values + judgment over rules-as-constraints (a longstanding Anthropic design choice, contrasted with OpenAI's more rule-laden Model Spec) finds empirical support in this paper.
Versions and adjacent specs#
- Claude's Constitution — Anthropic, Askell et al. 2026
- OpenAI Model Spec — 2025 (https://model-spec.openai.com/2025-12-18.html), Wolfe 2026 essay (https://openai.com/index/our-approach-to-the-model-spec/)
- Anti-scheming spec — Schoen et al. 2025 (arXiv 2509.15541), informs SP1–3
- Philosophy Spec — research artifact in the MSM paper (Appendix D.1), addresses self-preservation and goal-guarding via impermanence + epistemic humility, not for production
Connections#
- Trained on via: Model Spec Midtraining (MSM)
- Studied empirically via: Model Spec Science
- Embodied in: Claude Character as Product (the personality side of the spec)
- Authoring org: Anthropic
- OpenAI counterpart: Symphony's SPEC.md is a product spec, not an alignment spec — same pattern, different layer
- Adjacent eval: Agentic Misalignment (AM)
- Adjacent training method: Deliberative Alignment (treats the spec as in-context for CoT generation)
Sources#
9 articles link here
- ConceptAlignment Fine-Tuning (AFT)
Standard post-pretraining stage (SFT + RLHF) for installing values; shallow-alignment failure mode motivates Model Spec…
- EntityAnthropic
AI safety company / vendor of Claude; mission-as-tiebreaker culture; ~30–40 PMs across teams; Mike Krieger leads Labs r…
- EntityChloe Li
Lead author of MSM paper (arXiv 2605.02087); Anthropic Fellows Program; designed all specs and experiments
- ConceptClaude Character as Product
Personality as load-bearing product surface; Amanda's role at Anthropic; lunchtime vibe-checks as eval discipline; the…
- EntityClaude Code
Anthropic's agentic coding product; created by Boris Cherny late 2024; TypeScript/React; CLI/desktop/web/mobile/IDE sur…
- ConceptModel Spec Midtraining (MSM)
New training phase between pretrain and AFT: train base model on synthetic docs discussing the Model Spec; controls AFT…
- ConceptModel Spec Science
Empirical study of which Model Spec features best generalize alignment; value explanations > rules alone, specific > ge…
- EntitySymphony
OpenAI's open-source agent orchestrator (March 2026): turns Linear into a control plane for Codex, per-issue workspace,…
- ConceptSynthetic Document Finetuning (SDF)
Wang et al. 2025 technique for modifying model beliefs via fine-tuning on synthetic documents; foundation that Model Sp…
Related articles
- ConceptModel Spec Midtraining (MSM)
New training phase between pretrain and AFT: train base model on synthetic docs discussing the Model Spec; controls AFT…
- EntityAnthropic
AI safety company / vendor of Claude; mission-as-tiebreaker culture; ~30–40 PMs across teams; Mike Krieger leads Labs r…
- ConceptAgentic Misalignment (AM)
Lynch et al. 2025 eval and threat model: LLM email-agent discovers it may be deleted, can take harmful actions; OOD rel…
- ConceptModel Spec Science
Empirical study of which Model Spec features best generalize alignment; value explanations > rules alone, specific > ge…
- ConceptAlignment Fine-Tuning (AFT)
Standard post-pretraining stage (SFT + RLHF) for installing values; shallow-alignment failure mode motivates Model Spec…
