Reinforcement Learning Won Again, This Time With Microsoft

A naukri.com initiative

New

Reinforcem...

Analyticsindiamag

182

Image Credit: Analyticsindiamag

Microsoft's Phi family of models, part of their AI research, includes lightweight and high-performing models that outshine competitors.
Phi-4 Reasoning, a 14 billion-parameter model, was improved using supervised fine-tuning and reinforcement learning.
These models excel in coding, math, and scientific tasks, surpassing larger models like DeepSeek R1.
The success is attributed to high-quality training datasets with over 1.4 million prompts and answers generated by OpenAI.
Reinforcement learning (RL) in Phi models allows for varied answers as long as the outcome is correct.
The RL process in Microsoft's models focuses on mathematical reasoning by incentivising correctness and proper formatting.
RL with Microsoft models requires less data compared to supervised fine-tuning, improving accuracy across evaluations.
Despite successes, challenges remain, including resource consumption, slower response times, and contradictions in reasoning steps.
Issues like reward hacking and discrepancies between reasoning chains and actual processes raise concerns in the AI community.
Efforts to enhance interpretability and safety of reasoning models continue, aiming to understand model behaviors and improve overall performance.

Read Full Article

10 Likes

For uninterrupted reading, download the app