<ul data-eligibleForWebStory="false"><li>Large Language Models (LLMs) require enhanced mathematical reasoning for improving AI capabilities.</li><li>A new paper introduces a practical training approach that combines Supervised Fine-Tuning (SFT) with Reinforcement Learning (RL) for maximizing accuracy and efficiency.</li><li>The methodology involves extending SFT for up to 10 epochs to enhance accuracy and then using RL from online inference (GRPO) to improve token efficiency without compromising performance.</li><li>Experiments demonstrate the effectiveness of this approach, resulting in top-tier performance on benchmarks like the AI Mathematical Olympiad and providing a blueprint for developing advanced mathematical reasoning models.</li></ul>

A Practical Two-Stage Recipe for Mathematical LLMs: Maximizing Accuracy with SFT and Efficiency with Reinforcement Learning

Discover more