Dyna-style off-policy model-based reinforcement learning (DMBRL) algorithms aim to enhance sample efficiency by generating synthetic state transition data.
A study found a performance gap in DMBRL algorithms when applied to different benchmark environments with proprioceptive observations.
While DMBRL algorithms perform well in OpenAI Gym, their performance drops significantly in DeepMind Control Suite (DMC) despite similar tasks and physics backends.
Modern techniques addressing key issues in these settings do not consistently improve performance across all environments.
Adding synthetic rollouts to the training process, a core aspect of Dyna-style algorithms, actually degrades performance in most DMC environments.
The study sheds light on challenges in model-based RL and highlights that there is no 'free lunch' when evaluating performance across various benchmarks in RL.