<ul><li>A new decomposition method called Local Loss Landscape Decomposition (L3D) has been introduced to better understand the circuits employed by models.</li><li>L3D identifies a set of low-rank subnetworks in the parameter space that can reconstruct the gradient of the loss between any sample's output and a reference output vector.</li><li>The method was successfully tested on progressively more challenging toy models, showing its ability to recover associated subnetworks.</li><li>L3D was applied to a transformer model and a convolutional neural network, demonstrating its potential to identify interpretable and relevant circuits in parameter space.</li></ul>

Identifying Sparsely Active Circuits Through Local Loss Landscape Decomposition

Discover more