menu
techminis

A naukri.com initiative

google-web-stories
Home

>

ML News

>

An Optimis...
source image

Arxiv

3d

read

298

img
dot

Image Credit: Arxiv

An Optimistic Algorithm for online CMDPS with Anytime Adversarial Constraints

  • Online safe reinforcement learning is crucial in dynamic environments like autonomous driving, robotics, and cybersecurity.
  • Existing methods for constrained Markov decision processes struggle in adversarial settings with unknown, time-varying constraints.
  • The Optimistic Mirror Descent Primal-Dual (OMDPD) algorithm is introduced to handle online CMDPs with anytime adversarial constraints.
  • OMDPD achieves optimal regret O(sqrt(K)) and strong constraint violation O(sqrt(K)) without requiring a strictly known safe policy, providing practical guarantees for safe decision-making in adversarial environments.

Read Full Article

like

17 Likes

For uninterrupted reading, download the app