Feature attribution methods explain behavior of machine learning models by assigning importance scores to each input feature.
Evaluating these methods empirically is a challenge, leading to proposed axiomatic frameworks to establish method credibility.
A new feature attribution framework is introduced in this work, departing from restrictive axioms by defining attributions for simple models and building upon them.
The framework derives closed-form expressions for attribution of deep ReLU networks and focuses on optimizing evaluation metrics based on feature attributions.