menu
techminis

A naukri.com initiative

google-web-stories
Home

>

ML News

>

Log-Sum-Ex...
source image

Arxiv

2d

read

338

img
dot

Image Credit: Arxiv

Log-Sum-Exponential Estimator for Off-Policy Evaluation and Learning

  • A new log-sum-exponential estimator is introduced for off-policy learning and evaluation in logged bandit feedback datasets.
  • The estimator addresses challenges such as high variance, low-quality propensity scores, and heavy-tailed reward distributions.
  • It demonstrates variance reduction and robustness under heavy-tailed conditions, outperforming traditional inverse propensity score estimators.
  • Theoretical analysis and empirical evaluations confirm the practical advantages of the new estimator in off-policy learning and evaluation scenarios.

Read Full Article

like

20 Likes

For uninterrupted reading, download the app