menu
techminis

A naukri.com initiative

google-web-stories
Home

>

ML News

>

Active tea...
source image

Arxiv

1d

read

42

img
dot

Image Credit: Arxiv

Active teacher selection for reinforcement learning from human feedback

  • Reinforcement learning from human feedback (RLHF) enables machine learning systems to learn objectives from human feedback.
  • The Hidden Utility Bandit (HUB) framework is proposed to model differences in teacher rationality, expertise, and costliness, formalizing the problem of learning from multiple teachers.
  • The Active Teacher Selection (ATS) algorithm outperforms baseline algorithms by actively selecting when and which teacher to query.
  • The HUB framework and ATS algorithm facilitate future research on active teacher selection for robust reward modeling.

Read Full Article

like

2 Likes

For uninterrupted reading, download the app