Researchers have investigated the numeric representations that emerge in neural systems and how they can be interpreted through the lens of interpretable Symbolic Algorithms (SAs).
The study analyzed the raw activity of GRUs, LSTMs, and Transformers trained on numeric tasks and found that the neural activity can be interpreted as simplified SAs when framed in interpretable subspaces.
The research highlighted the importance of causal interventions for neural network interpretability and demonstrated that recurrent models develop graded, symbol-like number variables in their neural activity.
Additionally, the study introduced a generalization of Distributed Alignment Search (DAS) and showed that Transformers employ anti-Markovian solutions in the absence of sufficient attention layers.