menu
techminis

A naukri.com initiative

google-web-stories
Home

>

Data Science News

>

Reinforcem...
source image

Analyticsindiamag

1d

read

150

img
dot

Image Credit: Analyticsindiamag

Reinforcement Learning Won Again, This Time With Microsoft

  • Microsoft's Phi family of models, part of their AI research, includes lightweight and high-performing models that outshine competitors.
  • Phi-4 Reasoning, a 14 billion-parameter model, was improved using supervised fine-tuning and reinforcement learning.
  • These models excel in coding, math, and scientific tasks, surpassing larger models like DeepSeek R1.
  • The success is attributed to high-quality training datasets with over 1.4 million prompts and answers generated by OpenAI.
  • Reinforcement learning (RL) in Phi models allows for varied answers as long as the outcome is correct.
  • The RL process in Microsoft's models focuses on mathematical reasoning by incentivising correctness and proper formatting.
  • RL with Microsoft models requires less data compared to supervised fine-tuning, improving accuracy across evaluations.
  • Despite successes, challenges remain, including resource consumption, slower response times, and contradictions in reasoning steps.
  • Issues like reward hacking and discrepancies between reasoning chains and actual processes raise concerns in the AI community.
  • Efforts to enhance interpretability and safety of reasoning models continue, aiming to understand model behaviors and improve overall performance.

Read Full Article

like

9 Likes

For uninterrupted reading, download the app