<ul><li>Reinforcement learning (RL) based fine-tuning is important for post-training language models for advanced mathematical reasoning and coding.</li><li>RL fine-tuning consistently improves performance, even in smaller-scale models, but the underlying mechanisms are not well-understood.</li><li>RL fine-tuning amplifies patterns in the pretraining data and converges towards a dominant output distribution.</li><li>RL post-training on simpler questions can lead to performance gains on harder ones, indicating generalization of reasoning capabilities.</li></ul>

Echo Chamber: RL Post-training Amplifies Behaviors Learned in Pretraining

Discover more