Reasoning is essential for large language models (LLMs) to perform well in various tasks, but methods like Chain-of-Thought (CoT) reasoning incur high token costs.
Current LLMs have lengthy reasoning processes that can be compressed by incorporating a reasonable token budget in the prompt.
The choice of token budget is crucial for effective compression, leading to the proposal of a token-budget-aware LLM reasoning framework.
This framework dynamically adjusts the number of reasoning tokens based on problem complexity, reducing token costs in CoT reasoning while maintaining performance.