Accurate evaluation of forecasting models is essential for ensuring reliable predictions.
Current practices for evaluating and comparing forecasting models focus on summarizing performance into a single score, which may not provide enough information about model behavior under varying conditions.
To address this limitation, ModelRadar is proposed as a framework for evaluating univariate time series forecasting models across multiple aspects, such as stationarity, presence of anomalies, or forecasting horizons.
Comparing 24 forecasting methods, including classical approaches and different machine learning algorithms, NHITS, a state-of-the-art neural network architecture, performs best overall, but its superiority varies depending on the forecasting conditions.