menu
techminis

A naukri.com initiative

google-web-stories
Home

>

ML News

>

Online Epi...
source image

Arxiv

2d

read

381

img
dot

Image Credit: Arxiv

Online Episodic Convex Reinforcement Learning

  • Researchers have studied online learning in episodic finite-horizon Markov decision processes with convex objective functions, referred to as concave utility reinforcement learning (CURL) problem.
  • This setting extends RL from linear to convex losses on the state-action distribution induced by the agent's policy, requiring new algorithmic approaches due to the non-linearity of CURL.
  • The first algorithm achieving near-optimal regret bounds for online CURL without prior knowledge on the transition function has been introduced, utilizing online mirror descent algorithm and exploration bonus.
  • Additionally, the bandit version of CURL has been addressed for the first time, with a sub-linear regret bound achieved by adapting techniques from bandit convex optimization to the MDP setting.

Read Full Article

like

22 Likes

For uninterrupted reading, download the app