menu
techminis

A naukri.com initiative

google-web-stories
Home

>

ML News

>

Regressing...
source image

Arxiv

1d

read

117

img
dot

Image Credit: Arxiv

Regressing the Relative Future: Efficient Policy Optimization for Multi-turn RLHF

  • Large Language Models (LLMs) such as GPT-3 have achieved great success in single-turn tasks like summarization.
  • However, they struggle with multi-turn tasks like dialogue that require long-term planning.
  • To address this, researchers have introduced REgressing the RELative FUture (REFUEL), an efficient policy optimization approach for multi-turn reinforcement learning from human feedback (RLHF) in LLMs.
  • REFUEL outperforms state-of-the-art methods like DPO and REBEL, and can match the performance of any policy covered by the training set.

Read Full Article

like

7 Likes

For uninterrupted reading, download the app