Extended reality (XR) systems, encompassing VR, AR, and MR, provide immersive human-computer interaction.The paper introduces the concept of multi-modal multi-task federated foundation models (FedFMs) for XR systems.FedFMs combine M3T foundation models with privacy-preserving federated learning for improved capabilities.The focus is on addressing XR challenges like sensor diversity, hardware constraints, interactivity, task variability, and environmental factors.