A new decomposition method called Local Loss Landscape Decomposition (L3D) has been introduced to better understand the circuits employed by models.
L3D identifies a set of low-rank subnetworks in the parameter space that can reconstruct the gradient of the loss between any sample's output and a reference output vector.
The method was successfully tested on progressively more challenging toy models, showing its ability to recover associated subnetworks.
L3D was applied to a transformer model and a convolutional neural network, demonstrating its potential to identify interpretable and relevant circuits in parameter space.