Multimodal Federated Learning (MFL) combines leveraging multiple modalities to enhance downstream inference performance and enabling distributed training for efficiency and privacy preservation.
A comprehensive taxonomy for organizing MFL within different Federated Learning (FL) paradigms is currently lacking, despite the growing interest in MFL.
Challenges in MFL, such as modality heterogeneity and communication inefficiency, differ significantly from those in traditional unimodal scenarios.
This paper systematically examines MFL in the context of horizontal FL (HFL), vertical FL (VFL), and hybrid FL paradigms, discussing challenges and providing insights for future research.