Multimodal large language models (MLLMs) can process visual, textual, and auditory data.Existing video question-answering benchmarks often exhibit bias towards a single modality.The modality importance score (MIS) is introduced to identify and assess modality bias.MLLM-derived MIS can guide the curation of modality-balanced datasets to enhance multimodal learning.