Allen AI has released the new Tülu 3 family of models that are matching or even beating DeepSeek on key benchmarks.
Tülu 3 is open-source, with Allen AI releasing the complete training pipeline, code, and even their reinforcement learning method called Reinforcement Learning with Verifiable Rewards (RLVR) that made this possible.
Tülu 3 is built using a unique four-stage training process, which involves strategic data selection, building better responses, learning from comparisons, and Reinforcement Learning with Verifiable Rewards.
RLVR replaces subjective reward models with concrete verification, which is a technical breakthrough that deserves attention. This technique trains Tülu 3 on the correctness of its answers and results in binary feedback without any room for partial credit or fuzzy evaluation.
Tülu 3's 405B parameter version performs well in competiting directly with top models in math, coding, and instruction following.
Allen AI has released complete documentation of the development process including complete training pipelines, data processing tools, evaluation frameworks and implementation specifications.
This open approach accelerates innovation across the field, enabling developers to build on proven approaches, and sparking a new wave of AI development.
The success of Tülu 3 is a big moment for open AI development, which changes the industry when open source models match or exceed private alternatives.
Allen AI's verifiable rewards and multi-stage training techniques pave the way for future AI development, providing a foundation for teams to build upon and push performance even higher.
Tülu 3's breakthroughs in multiple stages training and verifiable rewards hint at what is coming, and a new wave of AI development has just begun.