Study on scaling laws of encoder-decoder autoregressive transformer models for motion forecasting and planning in autonomous driving domain.
Model performance improves with total compute budget following a power-law function, similar to language modeling, with a correlation between training loss and evaluation metrics.
Closed-loop metrics also improve with scaling, impacting the suitability of open-loop metrics for model development and hill climbing.
Optimal scaling of transformer parameters and training data size shows the need to increase model size faster than dataset size as the training compute budget grows.