menu
techminis

A naukri.com initiative

google-web-stories
Home

>

Technology News

>

New techni...
source image

VentureBeat

1M

read

413

img
dot

Image Credit: VentureBeat

New technique helps LLMs rein in CoT lengths, optimizing reasoning without exploding compute costs

  • Researchers at Carnegie Mellon University propose a new technique, LCPO, to control the length of CoT in LLMs for cost optimization.
  • LCPO conditions models to provide correct answers while keeping their 'thoughts' within a predetermined token budget.
  • Models trained with LCPO show a balance between accuracy and costs, outperforming larger models on equal reasoning lengths.
  • Controlling CoT length is crucial as longer CoT chains lead to more accurate responses but create a compute bottleneck at scale.
  • LCPO introduces two training objectives: obtaining correct results and limiting the CoT chain within a specific token length.
  • LCPO-trained models learn to satisfy length constraints while optimizing reasoning performance without heuristics.
  • The researchers tested two versions of LCPO - LCPO-exact and LCPO-max - on a 1.5B-parameter reasoning model.
  • L1 models based on LCPO can balance token budget and reasoning performance effectively, reproducing original model performance at a lower cost.
  • L1 models outperform S1 significantly and even outperform GPT-4o on equal generation length in certain tasks.
  • Models trained with LCPO adjustments show adaptability in their reasoning process based on token budget, improving reasoning quality.

Read Full Article

like

24 Likes

For uninterrupted reading, download the app