<ul><li>The dueling bandit problem is gaining popularity in various fields due to its applications in online advertising, recommendation systems, and more.</li><li>Delays in feedback pose a challenge to existing dueling bandit literature, affecting the agent's ability to update their policy quickly and accurately.</li><li>A new problem called biased dueling bandit problem with stochastic delayed feedback is introduced, involving preference bias between selections.</li><li>Two algorithms are presented to handle delayed feedback, one requiring complete delay distribution information and the other only the expected value of delay.</li></ul>

Biased Dueling Bandits with Stochastic Delayed Feedback

Discover more