Forecasting the Consumer Price Index (CPI) is an essential yet complex task in economics, often relying on survey-based data.
This paper introduces LLM-CPI, an approach that utilizes large language models (LLMs) to improve CPI prediction by incorporating high-frequency online text data.
LLMs like ChatGPT and BERT are used to generate continuous inflation labels from online texts collected from a Chinese social network site.
Online text embeddings are obtained through LDA and BERT techniques.
A joint time series framework is developed that merges monthly CPI data with LLM-generated daily CPI surrogates.
The monthly model combines observed CPI data, text embeddings, and macroeconomic variables in an ARX structure.
The daily model uses LLM-generated CPI surrogates and text embeddings in a VARX structure.
The method's asymptotic properties are analyzed, and two forms of prediction intervals are provided.
The performance and advantages of LLM-CPI are illustrated through simulation and real data examples.