Graph machine learning has made significant strides in recent years, yet the integration of visual information with graph structure and its potential for improving performance in downstream tasks remains an underexplored area.
To address this critical gap, the Multimodal Graph Benchmark (MM-GRAPH) is introduced as a comprehensive evaluation framework for multimodal graph learning, incorporating both visual and textual information into graph learning tasks.
MM-GRAPH consists of seven diverse datasets, designed to assess algorithms across different tasks in real-world scenarios, featuring rich multimodal node attributes including visual data.
Through an extensive empirical study, valuable insights into the challenges and opportunities of integrating visual data into graph learning are provided.