menu
techminis

A naukri.com initiative

google-web-stories
Home

>

ML News

>

Self-Train...
source image

Arxiv

1w

read

255

img
dot

Image Credit: Arxiv

Self-Training Large Language Models with Confident Reasoning

  • Recent studies have explored self-training methods for improving reasoning capabilities of Large Language Models (LLMs) using pseudo-labels generated by the LLMs themselves.
  • Confidence-based self-training fine-tunes LLMs to prefer reasoning paths with high-confidence answers, relying on majority voting to estimate confidence.
  • Proposed self-training method, CORE-PO, advocates using reasoning-level confidence to identify high-quality reasoning paths, leading to improved accuracy on various benchmarks.
  • CORE-PO fine-tunes LLMs through Policy Optimization to prefer high-confidence reasoning paths, demonstrating enhanced performance compared to existing self-training methods.

Read Full Article

like

15 Likes

For uninterrupted reading, download the app