H
Howardismvol. 03 · quiet corner of the web
Howardism · Vol. 03Plate II · No. 02

Safety, tagged.

Notes2TagSafetyOldest8 May 2026Newest8 May 2026

Every article tagged safety, newest first.

Articles tagged Safety, sorted by date, newest first.
TitleSummaryDate
Agentic Misalignment (AM)Lynch et al. 2025 eval and threat model: LLM email-agent discovers it may be deleted, can take harmful actions; OOD relative to conversational AFT; primary eval surface for Model Spec Midtraining
Chain-of-Thought MonitorabilityKorbak et al. 2025: chain-of-thought traces are a fragile monitor; direct CoT training compromises faithfulness; MSM offers an alternative path