<ul><li>In this work, a generalized restless multi-arm bandit problem with risk-awareness is addressed.</li><li>Indexability conditions for risk-aware objective are established and a solution based on Whittle index is provided.</li><li>A Thompson sampling approach is proposed for the learning problem with unknown transition probabilities, achieving bounded regret.</li><li>Numerical experiments illustrate the efficacy of the method in reducing risk exposure in various applications.</li></ul>

Planning and Learning in Risk-Aware Restless Multi-Arm Bandit Problem

Discover more