<ul><li>Test-time scaling aims to enhance Large Language Models (LLMs) reasoning by utilizing more compute at inference time and enabling extrapolation for improved performance on challenging problems.</li><li>Existing reasoning models generally do not extrapolate well, but one way to enable extrapolation is by training the LLM to engage in in-context exploration.</li><li>In-context exploration involves training the LLM to appropriately utilize its test time by chaining operations and testing multiple hypotheses before providing an answer.</li><li>The proposed recipe e3 includes chaining skills, leveraging negative gradients, and coupling task difficulty with training token budget to enable in-context exploration, resulting in improved performance and extrapolation for Large Language Models.</li></ul>

e3: Learning to Explore Enables Extrapolation of Test-Time Compute for LLMs

Discover more