<ul><li>Post-training is important for enhancing reasoning capabilities of large language models.</li><li>Supervised fine-tuning (SFT) is efficient but may lead to overfitting in larger models.</li><li>Reinforcement fine-tuning (RFT) generally yields better generalization but depends heavily on base model strength.</li><li>Unified Fine-Tuning (UFT) combines SFT and RFT into an integrated process, outperforming both methods regardless of model sizes.</li></ul>

UFT: Unifying Supervised and Reinforcement Fine-Tuning

Discover more