menu
techminis

A naukri.com initiative

google-web-stories
Home

>

ML News

>

Beyond Bra...
source image

Arxiv

3d

read

386

img
dot

Image Credit: Arxiv

Beyond Bradley-Terry Models: A General Preference Model for Language Model Alignment

  • Modeling human preferences is essential for aligning foundation models with human values.
  • Traditional reward modeling methods like the Bradley-Terry model have limitations in expressing complex preferences, especially in handling intransitive preferences.
  • This study introduces preference embedding, which involves embedding responses into a latent space to efficiently capture intricate preference structures with linear query complexity.
  • The General Preference Optimization (GPO), based on preference scores, is proposed to generalize reward-based reinforcement learning from human feedback (RLHF).
  • Experimental results demonstrate that the General Preference embedding Model (GPM) consistently outperforms the BT reward model on the RewardBench benchmark and effectively models cyclic preferences.
  • Evaluation on tasks like AlpacaEval2.0 after language model post-training with GPO and the general preference model shows performance enhancements over BT models.
  • The method seems promising in enhancing the alignment of foundation models with diverse human values, indicating potential for improvement over existing models.
  • The code for this model is available at https://github.com/general-preference/general-preference-model.

Read Full Article

like

23 Likes

For uninterrupted reading, download the app