ImagineBench is a new benchmark introduced to evaluate offline RL algorithms that use both real rollouts and LLM-imaginary rollouts.
It addresses the challenge of dependency on real-world interaction data by leveraging large language models to generate synthetic experience for mastering novel tasks.
The benchmark includes datasets with environment-collected and LLM-imaginary rollouts, diverse domains of environments, and natural language task instructions of varying complexity levels.
Evaluation of state-of-the-art offline RL algorithms using ImagineBench reveals the need for algorithm advancements to better utilize LLM-imaginary rollouts for improved performance on unseen tasks.