Researchers introduce MMIE, a large-scale benchmark for evaluating multimodal comprehension and generation in Large Vision-Language Models (LVLMs).MMIE consists of 20K curated multimodal queries covering various categories and subfields.The benchmark supports interleaved inputs and outputs, evaluating competencies through multiple-choice and open-ended questions.An automated evaluation metric with reduced bias and improved accuracy is proposed.