Large language models (LLMs) trained with chain-of-thought (CoT) supervision have shown remarkable reasoning capabilities.
A new method called EPiC has been introduced to condense CoT traces for resource-efficient reasoning training.
EPiC selectively retains problem understanding and solution convergence stages in the reasoning trace, reducing training time by over 34% without compromising reasoning accuracy.
This approach aims at achieving lossless reasoning supervision while enhancing efficiency in training reasoning models.