<ul><li>In-Context Learning (ICL) is an intriguing ability of large language models (LLMs).</li><li>Research finds that Gemma-2 2B uses a two-step strategy, contextualize-then-aggregate, for task information assembly.</li><li>In the lower layers, the model builds up representations of individual fewshot examples, contextualized by preceding examples.</li><li>In the higher layers, these representations are aggregated to identify the task and prepare predictions.</li></ul>

Contextualize-then-Aggregate: Circuits for In-Context Learning in Gemma-2 2B

Discover more