Generative models have difficulties in reliably evaluating sample quality for critical applications due to the concepts of fidelity and coverage.
To address this issue, two novel metrics, Clipped Density and Clipped Coverage, have been introduced to prevent out-of-distribution samples from biasing aggregated values.
These metrics exhibit linear score degradation as poor samples increase, making them easily interpretable as proportions of good samples.
Extensive experiments show that Clipped Density and Clipped Coverage outperform existing methods in terms of evaluating generative models in robustness, sensitivity, and interpretability.