menu
techminis

A naukri.com initiative

google-web-stories
Home

>

Neural Networks News

>

DeepSeek’s...
source image

Medium

2M

read

335

img
dot

Image Credit: Medium

DeepSeek’s AI improves when rewarded

  • DeepSeek R1 uses the Group Relative Policy Optimization (GRPO) approach to continually evaluate and select the best output.
  • Unlike traditional methods with fixed reward functions, DeepSeek R1 dynamically selects reward functions based on the task and goals of the prompt.
  • Multiple reward functions can be used to balance accuracy, speed, compliance, and safety.
  • This approach increases transparency, understanding, and trust in the AI's reasoning decisions.

Read Full Article

like

20 Likes

For uninterrupted reading, download the app