menu
techminis

A naukri.com initiative

google-web-stories
Home

>

Data Science News

>

What makes...
source image

Medium

1M

read

40

img
dot

Image Credit: Medium

What makes DeepSeek Math solve IMO questions

  • DeepSeek Math uses a rule-based reward system to solve IMO questions.
  • The reward system includes accuracy rewards and format rewards.
  • DeepSeek Math adopted Proximal Policy Optimization (PPO) and Group Relative Policy Optimization (GRPO).
  • The model prevents catastrophic forgetting of the learned LLMs by bounding policy updates.

Read Full Article

like

2 Likes

For uninterrupted reading, download the app