menu
techminis

A naukri.com initiative

google-web-stories
Home

>

ML News

>

Multi-Task...
source image

Arxiv

2d

read

35

img
dot

Image Credit: Arxiv

Multi-Task Reward Learning from Human Ratings

  • Reinforcement learning from human feedback (RLHF) is crucial for aligning model behavior with user goals.
  • Current RLHF methods oversimplify human decision-making by focusing on isolated tasks like classification or regression.
  • A new reinforcement learning method presented in an arXiv paper considers multiple tasks to mimic human decision-making.
  • The proposed method leverages human ratings in reward-free settings to learn a reward function, striking a balance between classification and regression models.
  • This approach accounts for the uncertainty in human decision-making and allows for adaptive strategy emphasis.
  • Experiments with synthetic human ratings demonstrate the superior performance of the new method over existing rating-based RL techniques.
  • The novel method even outperforms traditional RL approaches in certain scenarios.

Read Full Article

like

2 Likes

For uninterrupted reading, download the app