Sources#
Summary#
Entity. Lead author of "Model Spec Midtraining: Improving How Alignment Training Generalizes" (arXiv 2605.02087, May 2026). Member of the Anthropic Fellows Program. Designed the MSM specs, proposed and designed the experiments, produced all results, wrote the paper.
Contributions#
Per Author Contributions (App. A of the MSM paper):
- Led the project
- Designed the Model Specs used (cheese-preference specs, Philosophy Spec, Rules/Value-Augmented/Rule-Augmented specs, General Spec)
- Proposed and designed all experiments
- Produced all results
- Wrote the paper
Co-authors: Sara Price (Anthropic; advised initial phase), Jon Kutasov + Samuel Marks (jointly supervised; Jon proposed the project, Sam guided the controlling-generalization framing).
Code release#
Open-sourced the full MSM pipeline, AFT pipeline, Model Specs, and trained models: https://github.com/chloeli-15/model_spec_midtraining
Connections#
- Author of: Model Spec Midtraining (MSM) paper
- Affiliated with: Anthropic (Fellows Program)
- Co-authors: Sara Price, Jon Kutasov, Samuel Marks (Anthropic)
- Adjacent work: Synthetic Document Finetuning (SDF) (Wang et al., the technique MSM builds on)
Sources#
3 articles link here
- EntityAnthropic
AI safety company / vendor of Claude; mission-as-tiebreaker culture; ~30–40 PMs across teams; Mike Krieger leads Labs r…
- ConceptModel Spec Midtraining (MSM)
New training phase between pretrain and AFT: train base model on synthetic docs discussing the Model Spec; controls AFT…
- ConceptModel Spec Science
Empirical study of which Model Spec features best generalize alignment; value explanations > rules alone, specific > ge…
Related articles
- ConceptAgentic Misalignment (AM)
Lynch et al. 2025 eval and threat model: LLM email-agent discovers it may be deleted, can take harmful actions; OOD rel…
- ConceptAlignment Fine-Tuning (AFT)
Standard post-pretraining stage (SFT + RLHF) for installing values; shallow-alignment failure mode motivates Model Spec…
- EntityAnthropic
AI safety company / vendor of Claude; mission-as-tiebreaker culture; ~30–40 PMs across teams; Mike Krieger leads Labs r…
- ConceptClaude Character as Product
Personality as load-bearing product surface; Amanda's role at Anthropic; lunchtime vibe-checks as eval discipline; the…
- EntityClaude's Constitution / Model Spec
Anthropic Model Spec / Constitution by Askell et al.; document specifying Claude's values + hard constraints (SP1–3, GP…
