<ul><li>PrimeIntellect has released INTELLECT-2, a 32-billion parameter reasoning model post-trained using Generalized Reinforcement Policy Optimization in a fully decentralized, asynchronous reinforcement learning framework.</li><li>INTELLECT-2 exceeds the performance of QwQ-32B model in key reasoning benchmarks and is open-sourced under Apache 2.0 license for reproducibility and ongoing research.</li><li>INTELLECT-2's architecture includes PRIME-RL for asynchronous RL, SHARDCAST for efficient weight propagation, and TOPLOC for verification in distributed systems.</li><li>The model underwent reinforcement learning fine-tuning with 285,000 tasks focusing on reasoning, coding, and math problem solving, showing superior performance in decentralized post-training pipelines.</li></ul>

PrimeIntellect Releases INTELLECT-2: A 32B Reasoning Model Trained via Distributed Asynchronous Reinforcement Learning

Discover more