<ul><li>Reinforcement learning (RL) emerges in LLM's (Large Language Model) inference time, known as in-context RL (ICRL).</li><li>A novel multi-round prompting framework called ICRL prompting is proposed to prompt LLMs for task completion.</li><li>LLM's response quality increases as the context grows, maximizing the scalar reward signal in inference time like an RL algorithm.</li><li>ICRL prompting shows significant performance improvements in benchmarks such as Game of 24, creative writing, and ScienceWorld, even when LLM generates its own reward signals.</li></ul>

Reward Is Enough: LLMs Are In-Context Reinforcement Learners

Discover more