<ul><li>Multi-modal learning has achieved remarkable success by integrating information from various modalities, surpassing uni-modal approaches in tasks like recognition and retrieval.</li><li>Challenges arise in real-world scenarios when encountering novel modalities unseen during training, due to resource and privacy constraints, currently not adequately addressed by existing methods.</li><li>This paper introduces Modality Generalization (MG) to enhance model generalization to unseen modalities, defining Weak MG and Strong MG cases and proposing a benchmark for assessment.</li><li>Experiments reveal the complexity of MG, highlight limitations of current methods, and suggest key research directions for developing more adaptable multi-modal models to handle unseen modalities.</li></ul>

Towards Modality Generalization: A Benchmark and Prospective Analysis

Discover more