Skip to content
H
Howardismvol. 03 · quiet corner of the web
PLATE II · PIECE № 17HOWARDISM

Chloe Li

PublishedMay 8, 2026FiledEntityReading2 minSourceAI-synthesised

Lead author of MSM paper (arXiv 2605.02087); Anthropic Fellows Program; designed all specs and experiments

Illustration for Chloe Li

Sources#

Summary#

Entity. Lead author of "Model Spec Midtraining: Improving How Alignment Training Generalizes" (arXiv 2605.02087, May 2026). Member of the Anthropic Fellows Program. Designed the MSM specs, proposed and designed the experiments, produced all results, wrote the paper.

Contributions#

Per Author Contributions (App. A of the MSM paper):

  • Led the project
  • Designed the Model Specs used (cheese-preference specs, Philosophy Spec, Rules/Value-Augmented/Rule-Augmented specs, General Spec)
  • Proposed and designed all experiments
  • Produced all results
  • Wrote the paper

Co-authors: Sara Price (Anthropic; advised initial phase), Jon Kutasov + Samuel Marks (jointly supervised; Jon proposed the project, Sam guided the controlling-generalization framing).

Code release#

Open-sourced the full MSM pipeline, AFT pipeline, Model Specs, and trained models: https://github.com/chloeli-15/model_spec_midtraining

Connections#

Sources#

§ end
About this piece

Articles in this journal are synthesised by AI agents from a curated wiki and are refreshed automatically as new concepts arrive. Topics, framing, and editorial direction are curated by Howardism.

3 articles link here
  • EntityAnthropic

    AI safety company / vendor of Claude; mission-as-tiebreaker culture; ~30–40 PMs across teams; Mike Krieger leads Labs r…

  • ConceptModel Spec Midtraining (MSM)

    New training phase between pretrain and AFT: train base model on synthetic docs discussing the Model Spec; controls AFT…

  • ConceptModel Spec Science

    Empirical study of which Model Spec features best generalize alignment; value explanations > rules alone, specific > ge…

Related articles
  • ConceptAgentic Misalignment (AM)

    Lynch et al. 2025 eval and threat model: LLM email-agent discovers it may be deleted, can take harmful actions; OOD rel…

  • ConceptAlignment Fine-Tuning (AFT)

    Standard post-pretraining stage (SFT + RLHF) for installing values; shallow-alignment failure mode motivates Model Spec…

  • EntityAnthropic

    AI safety company / vendor of Claude; mission-as-tiebreaker culture; ~30–40 PMs across teams; Mike Krieger leads Labs r…

  • ConceptClaude Character as Product

    Personality as load-bearing product surface; Amanda's role at Anthropic; lunchtime vibe-checks as eval discipline; the…

  • EntityClaude's Constitution / Model Spec

    Anthropic Model Spec / Constitution by Askell et al.; document specifying Claude's values + hard constraints (SP1–3, GP…