<ul><li>The paper introduces neural variance-aware algorithms to address the contextual dueling bandit problem.</li><li>The algorithms leverage neural networks to approximate nonlinear utility functions and employ a variance-aware exploration strategy.</li><li>The design balances the exploration-exploitation tradeoff and achieves sublinear regret under both UCB and Thompson Sampling frameworks.</li><li>The algorithms achieve theoretical guarantees for sublinear cumulative average regret and show empirical validation of computational efficiency.</li></ul>

Neural Variance-aware Dueling Bandits with Deep Representation and Shallow Exploration

Discover more