Multimodal in-context learning (ICL) in the medical domain is explored in a new study, highlighting its potential for tasks requiring adaptation from limited examples.
SMMILE, a benchmark for medical tasks, was introduced by medical experts, consisting of 111 problems covering 6 specialties and 13 imaging modalities.
The study evaluated 15 multimodal large language models (MLLMs) on SMMILE, showing moderate to poor performance in multimodal ICL abilities.
ICL contributes only a slight improvement over zero-shot performance on SMMILE, with findings indicating susceptibility to irrelevant in-context examples and the impact of example ordering.