<ul><li>Researchers propose a novel inference-aware fine-tuning paradigm for large language models (LLMs).</li><li>The paradigm focuses on optimizing the performance of the inference-time strategy.</li><li>Imitation learning and reinforcement learning methods are devised to tackle the non-differentiable argmax operator within the Best-of-N (BoN) inference strategy.</li><li>Experiments show improved performance and inference-time compute using BoN-aware fine-tuning.</li></ul>

Inference-Aware Fine-Tuning for Best-of-N Sampling in Large Language Models

Discover more