Deep neural networks need reliable uncertainty calibration for safe deployment in critical applications.
Foundation models like ConvNeXt, EVA, and BEiT have improved predictive performance but their calibration properties are not well understood.
A study investigated the calibration behavior of foundation models, revealing insights that question existing beliefs.
Empirical analysis found that foundation models are often underconfident in in-distribution predictions, leading to higher calibration errors.
However, these models show improved calibration under distribution shifts.
Foundation models respond well to post-hoc calibration techniques in in-distribution scenarios, helping in mitigating underconfidence bias.
But the effectiveness of these techniques diminishes under severe distribution shifts and can sometimes yield counterproductive results.
The study highlights the intricate effects of architectural and training advancements on calibration, challenging the notion of continuous improvement.