<ul><li>ImagineBench is a new benchmark introduced to evaluate offline RL algorithms that use both real rollouts and LLM-imaginary rollouts.</li><li>It addresses the challenge of dependency on real-world interaction data by leveraging large language models to generate synthetic experience for mastering novel tasks.</li><li>The benchmark includes datasets with environment-collected and LLM-imaginary rollouts, diverse domains of environments, and natural language task instructions of varying complexity levels.</li><li>Evaluation of state-of-the-art offline RL algorithms using ImagineBench reveals the need for algorithm advancements to better utilize LLM-imaginary rollouts for improved performance on unseen tasks.</li></ul>

ImagineBench: Evaluating Reinforcement Learning with Large Language Model Rollouts

Discover more