Allocating more compute to large language models (LLMs) reasoning has been shown to improve their effectiveness but also increases inference time.
This paper explores whether LLMs can become faster at reasoning through recurrent exposure on relevant tasks.
The study formalizes the problem setting of LLM reasoning speedup in terms of task relevancy and compute budget calculation.
Experiments conducted show that LLMs can reason faster with past experience, achieving up to a 56% reduction in compute cost with suitable memory and reasoning methods.