<ul data-eligibleForWebStory="false"><li>Researchers have introduced Dirichlet Process Posterior Sampling (DPPS), a Bayesian non-parametric algorithm for multi-arm bandits.</li><li>DPPS, akin to Thompson-sampling, makes decisions based on posterior probabilities of arm optimality without assuming a parametric reward distribution.</li><li>The algorithm employs Dirichlet Process priors to model the reward generating distribution directly, offering a principled way to integrate prior beliefs.</li><li>Empirical studies demonstrate strong performance of DPPS in various bandit environments, with a non-asymptotic optimality shown through information-theoretic analysis.</li></ul>

Leveraging priors on distribution functions for multi-arm bandits

Discover more