<ul><li>Proposing a method for capturing diverse human preferences by utilizing a distribution over multiple reward functions.</li><li>Introducing a strategy to learn this distribution directly from pairwise preferences without predefined groups or annotator identifiers.</li><li>Focusing on pairwise calibration where the proportion of reward functions favoring a response aligns with the preferences of annotators.</li><li>Validating the effectiveness of the proposed method in representing pluralistic values through improved calibration results.</li></ul>

Pairwise Calibrated Rewards for Pluralistic Alignment

Discover more