Howardism · Vol. 03Plate II · No. 02
RLHF, tagged.
Notes2TagRLHFOldest14 Apr 2026Newest8 May 2026
Every article tagged rlhf, newest first.
| Title | Summary | Date |
|---|---|---|
| Alignment Fine-Tuning (AFT) | Standard post-pretraining stage (SFT + RLHF) for installing values; shallow-alignment failure mode motivates Model Spec Midtraining | |
| Scale-Dependent Prompt Sensitivity | Large models underperform small ones on 7.7% of standard benchmarks due to overthinking; brevity constraints recover 26pp and fully reverse hierarchy on GSM8K/MMLU-STEM |