Integrating multimodal Electronic Health Records (EHR) data has potential for predicting clinical outcomes.
Previous work focused on temporal interactions within samples and fusion of information, overlooking critical temporal patterns across patients.
Identifying temporal patterns like abnormal vital signs and corresponding textual descriptions is crucial.
A Cross-Modal Temporal Pattern Discovery (CTPD) framework is introduced to extract cross-modal temporal patterns efficiently.
CTPD uses shared initial temporal pattern representations and slot attention to generate temporal semantic embeddings.
A contrastive-based TPNCE loss is introduced for cross-modal alignment in learned patterns, along with two reconstruction losses.
Evaluations on 48-hour in-hospital mortality and 24-hour phenotype classification tasks using the MIMIC-III database highlight the superiority of the method.