menu
techminis

A naukri.com initiative

google-web-stories
Home

>

ML News

>

Learning a...
source image

Arxiv

3d

read

140

img
dot

Image Credit: Arxiv

Learning a Canonical Basis of Human Preferences from Binary Ratings

  • Recent advances in generative AI have been driven by alignment techniques such as reinforcement learning from human feedback (RLHF).
  • This paper focuses on understanding the preferences encoded in datasets used for RLHF and identifying common human preferences.
  • A small subset of 21 preference categories captures over 89% of preference variation across individuals, serving as a canonical basis of human preferences.
  • The identified preference basis proves useful for model evaluation and training, offering insights into model alignment and successful fine-tuning.

Read Full Article

like

8 Likes

For uninterrupted reading, download the app