menu
techminis

A naukri.com initiative

google-web-stories
Home

>

ML News

>

ProxySPEX:...
source image

Arxiv

1w

read

149

img
dot

Image Credit: Arxiv

ProxySPEX: Inference-Efficient Interpretability via Sparse Feature Interactions in LLMs

  • Large Language Models (LLMs) have achieved remarkable performance by capturing complex interactions between input features.
  • ProxySPEX is an interaction attribution algorithm designed to efficiently discover hierarchical feature interactions in LLMs.
  • ProxySPEX outperforms prior methods by more faithfully reconstructing LLM outputs with fewer inferences and identifying influential features more effectively.
  • Experiments demonstrate ProxySPEX's effectiveness across high-dimensional datasets and its applications in data attribution and mechanistic interpretability tasks.

Read Full Article

like

8 Likes

For uninterrupted reading, download the app