Large Language Models (LLMs) have shown success in reasoning tasks but are computationally inefficient due to overthinking and underthinking.
Researchers have introduced the Bayesian Budget Allocation Model (BBAM) to address inefficiencies in reasoning by modeling it as a sequence of sub-questions.
A test-time framework called Plan-and-Budget has been proposed to decompose complex queries into sub-questions and allocate token budgets efficiently based on estimated complexity.
Plan-and-Budget has been effective in improving reasoning efficiency, achieving accuracy gains, token reduction, and overall improvement in computation efficiency.