Reinforcement learning (RL) emerges in LLM's (Large Language Model) inference time, known as in-context RL (ICRL).
A novel multi-round prompting framework called ICRL prompting is proposed to prompt LLMs for task completion.
LLM's response quality increases as the context grows, maximizing the scalar reward signal in inference time like an RL algorithm.
ICRL prompting shows significant performance improvements in benchmarks such as Game of 24, creative writing, and ScienceWorld, even when LLM generates its own reward signals.