PrimeIntellect has released INTELLECT-2, a 32-billion parameter reasoning model post-trained using Generalized Reinforcement Policy Optimization in a fully decentralized, asynchronous reinforcement learning framework.
INTELLECT-2 exceeds the performance of QwQ-32B model in key reasoning benchmarks and is open-sourced under Apache 2.0 license for reproducibility and ongoing research.
INTELLECT-2's architecture includes PRIME-RL for asynchronous RL, SHARDCAST for efficient weight propagation, and TOPLOC for verification in distributed systems.
The model underwent reinforcement learning fine-tuning with 285,000 tasks focusing on reasoning, coding, and math problem solving, showing superior performance in decentralized post-training pipelines.