New technique helps LLMs rein in CoT lengths, optimizing reasoning without exploding compute costs

A naukri.com initiative

New

New techni...

VentureBeat

413

Image Credit: VentureBeat

Researchers at Carnegie Mellon University propose a new technique, LCPO, to control the length of CoT in LLMs for cost optimization.
LCPO conditions models to provide correct answers while keeping their 'thoughts' within a predetermined token budget.
Models trained with LCPO show a balance between accuracy and costs, outperforming larger models on equal reasoning lengths.
Controlling CoT length is crucial as longer CoT chains lead to more accurate responses but create a compute bottleneck at scale.
LCPO introduces two training objectives: obtaining correct results and limiting the CoT chain within a specific token length.
LCPO-trained models learn to satisfy length constraints while optimizing reasoning performance without heuristics.
The researchers tested two versions of LCPO - LCPO-exact and LCPO-max - on a 1.5B-parameter reasoning model.
L1 models based on LCPO can balance token budget and reasoning performance effectively, reproducing original model performance at a lower cost.
L1 models outperform S1 significantly and even outperform GPT-4o on equal generation length in certain tasks.
Models trained with LCPO adjustments show adaptability in their reasoning process based on token budget, improving reasoning quality.

Read Full Article

24 Likes

For uninterrupted reading, download the app