<ul><li>Large Language Models (LLMs) have achieved remarkable performance by capturing complex interactions between input features.</li><li>ProxySPEX is an interaction attribution algorithm designed to efficiently discover hierarchical feature interactions in LLMs.</li><li>ProxySPEX outperforms prior methods by more faithfully reconstructing LLM outputs with fewer inferences and identifying influential features more effectively.</li><li>Experiments demonstrate ProxySPEX's effectiveness across high-dimensional datasets and its applications in data attribution and mechanistic interpretability tasks.</li></ul>

ProxySPEX: Inference-Efficient Interpretability via Sparse Feature Interactions in LLMs

Discover more