<ul><li>Interpretability researchers have focused on understanding MLP neurons of language models based on contexts and output weight vectors, neglecting the interaction between input and output.</li><li>A study examined the cosine similarity between input and output weights of neurons in 12 models, finding enrichment neurons prevalent in early-middle layers and depletion neurons in later layers.</li><li>Enrichment neurons enhance concept representations, aiding factual recall in the early stages, while later layers tend more towards depletion to reduce certain inputs.</li><li>This input-output perspective complements activation-dependent analyses and approaches that treat input and output separately in interpreting neural network behaviors.</li></ul>

Understanding Gated Neurons in Transformers from Their Input-Output Functionality

Discover more