Researchers at Carnegie Mellon University propose a new technique, LCPO, to control the length of CoT in LLMs for cost optimization.LCPO conditions models to provide correct answers while keeping their 'thoughts' within a predetermined token budget.Models trained with LCPO show a balance between accuracy and costs, outperforming larger models on equal reasoning lengths.Controlling CoT length is crucial as longer CoT chains lead to more accurate responses but create a compute bottleneck at scale.LCPO introduces two training objectives: obtaining correct results and limiting the CoT chain within a specific token length.LCPO-trained models learn to satisfy length constraints while optimizing reasoning performance without heuristics.The researchers tested two versions of LCPO - LCPO-exact and LCPO-max - on a 1.5B-parameter reasoning model.L1 models based on LCPO can balance token budget and reasoning performance effectively, reproducing original model performance at a lower cost.L1 models outperform S1 significantly and even outperform GPT-4o on equal generation length in certain tasks.Models trained with LCPO adjustments show adaptability in their reasoning process based on token budget, improving reasoning quality.