This paper introduces a framework for conducting efficient inference on parameters derived from unstructured data like text, images, audio, and video.
The framework addresses the challenges of bias in predictions made by neural networks and the downstream estimators that rely on structured data extracted from unstructured inputs.
By reframing inference with unstructured data as a missing structured data problem, the framework applies classic results from semiparametric inference to create valid, efficient, and robust estimators.
The framework, known as MAR-S, provides economists with tools to construct unbiased estimators using unstructured data and is demonstrated through the re-analysis of influential studies.