<ul><li>In this paper, the authors propose an efficient algorithm for the online shortest path problem in directed acyclic graphs (DAGs) under bandit feedback against an adaptive adversary.</li><li>The algorithm achieves a near-minimax optimal regret bound of O(√|E|Tlog|X|) with high probability against any adaptive adversary.</li><li>The algorithm utilizes a novel loss estimator and a centroid-based decomposition to attain this regret bound.</li><li>The algorithm's application extends to various domains, including extensive-form games, shortest walks in directed graphs, hypercubes, and multi-task multi-armed bandits, providing improved regret guarantees in each of these settings.</li></ul>

Efficient Near-Optimal Algorithm for Online Shortest Paths in Directed Acyclic Graphs with Bandit Feedback Against Adaptive Adversaries

Discover more