Proposing a method for capturing diverse human preferences by utilizing a distribution over multiple reward functions.Introducing a strategy to learn this distribution directly from pairwise preferences without predefined groups or annotator identifiers.Focusing on pairwise calibration where the proportion of reward functions favoring a response aligns with the preferences of annotators.Validating the effectiveness of the proposed method in representing pluralistic values through improved calibration results.