Reinforcement-learning agents often face challenges transitioning from simulation to the real world.
Domain randomization (DR) is used to reduce the sim-to-real gap by training policies across various simulators with different dynamics parameters.
Offline Domain Randomization (ODR) is introduced, which leverages offline data from the real system to fit a distribution over simulator parameters.
Empirical studies have shown notable improvements with algorithms like DROPO, but the theoretical foundations of ODR require further exploration.
ODR is formalized as a maximum-likelihood estimation over a parametric simulator family and is proven to be consistent under certain conditions.
The study demonstrates that ODR converges to the true dynamics with a growing dataset and provides error bound comparisons with standard DR, showing ODR's superior performance.
Gap bounds reveal ODR's sim-to-real error to be significantly tighter than uniform DR in both finite-simulator and continuous settings.
E-DROPO, a new variant of DROPO, is introduced with an entropy bonus to enhance randomization, prevent variance collapse, and enable more robust zero-shot transfer in real-world applications.