Dyna-style off-policy model-based reinforcement learning (DMBRL) algorithms are facing a performance gap when applied across different benchmark environments.
While DMBRL algorithms perform well in OpenAI Gym, their performance drops significantly in DeepMind Control Suite (DMC) with proprioceptive observations.
Modern techniques designed to address issues in these settings do not consistently improve performance across all environments.
Adding synthetic rollouts to the training process, which is the backbone of Dyna-style algorithms, significantly degrades performance in most DMC environments.