The availability of machine learning systems has led to the use of synthetic labels in statistical inference applications.
The Prediction Powered Inference (PPI) framework aims to combine pseudo-labelled data with a small sample of real high-quality labels for efficient evaluation.
When labelled data is scarce, the PPI++ method may perform poorly compared to traditional inference methods like ordinary least squares regression.
The study relates PPI++ to regression techniques and introduces new PPI-based approaches that utilize robust regressors for improved estimation in scenarios with few labels.