Large Language Models (LLMs) have achieved remarkable performance by capturing complex interactions between input features.
ProxySPEX is an interaction attribution algorithm designed to efficiently discover hierarchical feature interactions in LLMs.
ProxySPEX outperforms prior methods by more faithfully reconstructing LLM outputs with fewer inferences and identifying influential features more effectively.
Experiments demonstrate ProxySPEX's effectiveness across high-dimensional datasets and its applications in data attribution and mechanistic interpretability tasks.