Researchers from ByteDance, Tsinghua University, and the University of Hong Kong have released an open-source system for AI reinforcement learning called DAPO (Dynamic Sampling Policy Optimisation).
DAPO is designed to provide reinforcement-learning techniques for large language models and aims to improve transparency and reproducibility in the field of AI research.
The DAPO system outperformed DeepSeek's GRPO method, achieving a higher score on the American Invitational Mathematics Examination (AIME) benchmark with increased efficiency.
ByteDance continues to invest in AI by collaborating with top-level AI researchers and developing popular AI applications like its Doubao chatbot.