OpenAI has launched reinforcement fine-tuning (RFT) for its o1 models, marking the end of traditional fine-tuning. With RFT, models do not just copy, they reason and learn from feedback to handle domain-specific tasks with minimal data. Early adopters have made significant achievements with RFT, from identifying genetic mutations that cause rare diseases to training legal models for high-stakes applications such as law and insurance. However, the approach may struggle in subjective domains or creative applications where there is no definite consensus. The RFT alpha program is open to select organizations, integrating domain-specific knowledge to advance mathematics, research and agent-based decision-making.
OpenAI's RFT enables organizations to train models using reinforcement learning to handle domain-specific tasks with minimal data, sometimes as few as 12 examples. RFT improves reasoning and accuracy in expert-level tasks by using reference answers to evaluate and refine model outputs.
Justin Reese, a computational biologist, highlighted RFT's transformative potential in healthcare, particularly for rare diseases affecting millions. "The ability to combine domain expertise with systematic reasoning over biomedical data is game-changing," he said.
OpenAI aims to refine RFT based on feedback from early participants and is set to release it publicly in 2025. Beyond its initial applications, OpenAI envisions RFT models advancing fields like mathematics, research, and agent-based decision-making.
OpenAI has also released o1, the full version, and a new $200 ChatGPT Pro model, which includes unlimited access to o1, o1-mini, and GPT-4o along with the advanced voice mode. The ChatGPT Pro plan offers all the features of the Plus plan and a new o1 Pro model, which uses more compute for the best answers to the hardest problems. OpenAI has also announced new developer-centric features, including structured outputs, function calling, developer messages, and API image understanding.
Early adopters have achieved remarkable results with RFT, from identifying genetic mutations that cause rare diseases to training legal models for high-stakes applications such as law and insurance.
OpenAI is using reinforced learning to train AI models to reason and think through problems, representing an advancement beyond traditional fine-tuning. In contrast to traditional fine-tuning, RFT allows the model to explore various solutions rather than relying on fixed labels, which enables it to focus on improving its reasoning capabilities.
Interestingly, with RFT, significant performance improvements can be achieved with just a few dozen examples because the model learns from feedback rather than needing to see all possible scenarios. However, the performance of RFT depends heavily on the quality of the training data and the design of the task.
OpenAI has also announced that its RFT alpha program is now open to select organizations to integrate domain-specific knowledge with the new approach.
OpenAI aims to refine RFT based on feedback from early adopters and plans to release it publicly in 2025. Beyond its initial applications, OpenAI envisions RFT models advancing fields like mathematics, research, and agent-based decision-making.