menu
techminis

A naukri.com initiative

google-web-stories
Home

>

Open Source News

>

ByteDance ...
source image

Marktechpost

2w

read

413

img
dot

ByteDance Research Releases DAPO: A Fully Open-Sourced LLM Reinforcement Learning System at Scale

  • ByteDance, Tsinghua University, and the University of Hong Kong have released DAPO, an open-source reinforcement learning system called Dynamic Sampling Policy Optimization for Large Language Models (LLMs).
  • DAPO aims to enhance the reasoning abilities of LLMs and promote reproducibility by openly sharing algorithmic details, training procedures, and datasets.
  • DAPO incorporates four core innovations: Clip-Higher, Dynamic Sampling, Token-level Policy Gradient Loss, and Overlong Reward Shaping.
  • Experimental results demonstrate significant improvements with DAPO, achieving higher scores on the American Invitational Mathematics Examination (AIME) 2024 benchmark with half the training steps compared to previous methods.

Read Full Article

like

24 Likes

For uninterrupted reading, download the app