menu
techminis

A naukri.com initiative

google-web-stories
Home

>

ML News

>

Provable S...
source image

Arxiv

2d

read

242

img
dot

Image Credit: Arxiv

Provable Sim-to-Real Transfer via Offline Domain Randomization

  • Reinforcement-learning agents often face challenges transitioning from simulation to the real world.
  • Domain randomization (DR) is used to reduce the sim-to-real gap by training policies across various simulators with different dynamics parameters.
  • Offline Domain Randomization (ODR) is introduced, which leverages offline data from the real system to fit a distribution over simulator parameters.
  • Empirical studies have shown notable improvements with algorithms like DROPO, but the theoretical foundations of ODR require further exploration.
  • ODR is formalized as a maximum-likelihood estimation over a parametric simulator family and is proven to be consistent under certain conditions.
  • The study demonstrates that ODR converges to the true dynamics with a growing dataset and provides error bound comparisons with standard DR, showing ODR's superior performance.
  • Gap bounds reveal ODR's sim-to-real error to be significantly tighter than uniform DR in both finite-simulator and continuous settings.
  • E-DROPO, a new variant of DROPO, is introduced with an entropy bonus to enhance randomization, prevent variance collapse, and enable more robust zero-shot transfer in real-world applications.

Read Full Article

like

14 Likes

For uninterrupted reading, download the app