Enhancing reasoning capabilities in Large Language Models (LLMs) is a key focus in research.
A new approach, TeaR, has been proposed to teach LLMs to reason better by leveraging data curation and reinforcement learning.
TeaR aims to improve general reasoning abilities by guiding models in discovering optimal reasoning paths through code-related tasks.
Extensive experiments show significant performance improvements with TeaR, achieving a 35.9% improvement on Qwen2.5-7B and 5.9% on R1-Distilled-7B benchmarks.