<ul><li>This paper explores integrating response time data into human preference learning frameworks for more effective reward model elicitation.</li><li>Novel methodologies are proposed to incorporate response time information alongside binary choice data, using the Evidence Accumulation Drift Diffusion (EZ) model.</li><li>Neyman-orthogonal loss functions are developed to achieve oracle convergence rates for reward model learning, improving sample efficiency compared to conventional preference learning.</li><li>Theoretical analysis and experiments validate the effectiveness of incorporating response time information in preference learning over images.</li></ul>

Preference Learning with Response Time

Discover more