Understanding internal representations of large language models is crucial for interpretability research.A new framework called InverseScope is introduced for interpreting neural activations through input inversion.InverseScope defines a distribution over inputs to generate similar activations and analyze to infer encoded features.It scales inversion-based interpretability methods for larger models and enables quantitative analysis of internal representations in real-world LLMs.