menu
techminis

A naukri.com initiative

google-web-stories
Home

>

ML News

>

How to Tra...
source image

Towards Data Science

1M

read

313

img
dot

Image Credit: Towards Data Science

How to Train LLMs to “Think” (o1 & DeepSeek-R1)

  • The article discusses insights from o1 and DeepSeek-R1 models, focusing on the impact of increased test-time compute on model performance.
  • o1 demonstrated that generating more tokens leads to better responses, showing a new scaling law in LLMs.
  • o1 introduced 'thinking' tokens to aid in post-training reasoning, allowing a human-interpretable insight into the model's thinking process.
  • DeepSeek-R1, unveiled in January 2025, explored reasoning in LLMs through reinforcement learning.
  • DeepSeek-R1 includes models such as DeepSeek-R1-Zero and DeepSeek-R1, focusing on reasoning capabilities through RL and supervised fine-tuning (SFT).
  • R1-Zero demonstrated emergent reasoning abilities through RL alone, discovering CoT and test-time compute scaling.
  • Reinforcement learning in R1-Zero involves a prompt template, dual-component rewards, and GRPO for stable model training.
  • DeepSeek-R1 was developed leveraging training strategies involving SFT, RL, and a multi-step process to enhance reasoning abilities.
  • The article highlights the interpretability issues faced by R1-Zero and the steps taken to improve interpretability through training strategies.
  • DeepSeek-R1 excels in reasoning tasks post several training steps involving SFT, RL, and human feedback.
  • The release of o1 and DeepSeek-R1 showcases advancements in LLMs using reinforcement learning, offering promising research directions for independent learning models.

Read Full Article

like

18 Likes

For uninterrupted reading, download the app