menu
techminis

A naukri.com initiative

google-web-stories
Home

>

ML News

>

Leveraging...
source image

Arxiv

1d

read

81

img
dot

Image Credit: Arxiv

Leveraging priors on distribution functions for multi-arm bandits

  • Researchers have introduced Dirichlet Process Posterior Sampling (DPPS), a Bayesian non-parametric algorithm for multi-arm bandits.
  • DPPS, akin to Thompson-sampling, makes decisions based on posterior probabilities of arm optimality without assuming a parametric reward distribution.
  • The algorithm employs Dirichlet Process priors to model the reward generating distribution directly, offering a principled way to integrate prior beliefs.
  • Empirical studies demonstrate strong performance of DPPS in various bandit environments, with a non-asymptotic optimality shown through information-theoretic analysis.

Read Full Article

like

4 Likes

For uninterrupted reading, download the app