menu
techminis

A naukri.com initiative

google-web-stories
Home

>

ML News

>

Efficient ...
source image

Arxiv

2d

read

31

img
dot

Image Credit: Arxiv

Efficient Preference-Based Reinforcement Learning: Randomized Exploration Meets Experimental Design

  • Study on reinforcement learning from human feedback in general Markov decision processes focusing on trajectory-level preference comparisons.
  • Challenge: Designing algorithms for informative preference queries to identify rewards with theoretical guarantees.
  • Proposed a meta-algorithm based on randomized exploration to address challenges without computational complexity.
  • Established regret and last-iterate guarantees under mild reinforcement learning oracle assumptions.
  • Introduced an improved algorithm that collects batches of trajectory pairs and uses optimal experimental design for informative queries.
  • Batch structure enables parallelization of preference queries, enhancing practical deployment efficiency.
  • Empirical evaluation confirms competitiveness with reward-based reinforcement learning using minimal preference queries.

Read Full Article

like

1 Like

For uninterrupted reading, download the app