Large Reasoning Models (LRMs) like o1 and DeepSeek-R1 have made progress in natural language reasoning with long chain-of-thought (CoT) but struggle with complex mathematical operations.
Code Interpreter (CI) introduces external knowledge to LRMs, but combining it directly poses challenges.
CoRT is a post-training framework designed to teach LRMs to effectively use CI for complex mathematical operations.
Data scarcity is addressed by synthesizing code-integrated reasoning data through Hint-Engineering, strategically inserting hints to optimize LRM-CI interaction.
30 high-quality samples are manually created to post-train models ranging from 1.5B to 32B parameters using supervised fine-tuning, rejection fine-tuning, and reinforcement learning.
Hint-Engineering models show 4% and 8% absolute improvements on DeepSeek-R1-Distill-Qwen-32B and DeepSeek-R1-Distill-Qwen-1.5B respectively across five challenging mathematical reasoning datasets.
Hint-Engineering models use about 30% fewer tokens for the 32B model and 50% fewer tokens for the 1.5B model compared to natural language models.
Experimental results demonstrate the effectiveness of CoRT in improving LRMs' performance on mathematical reasoning tasks.
The models and code for CoRT are available at https://github.com/ChengpengLi1003/CoRT.