A new paper published on arXiv proposes the influential bandit problem, a multi-armed bandit formulation that considers interdependencies and non-stationary environments.
The problem models arm interactions through an unknown interaction matrix that governs the dynamics of arm losses.
The paper establishes regret lower bounds for standard bandit algorithms and introduces a new algorithm based on a lower confidence bound (LCB) estimator.
Empirical evaluations demonstrate the presence of inter-arm influence and confirm the superior performance of the proposed method compared to conventional bandit algorithms.