A unified framework, MB-ORES, has been proposed to integrate object detection (OD) and visual grounding (VG) for remote sensing (RS) imagery.
MB-ORES fine-tunes an open-set object detector, using referring expression data, to support OD and establish a prior for VG task in remote sensing.
The model has a multi-branch network that generates task-aware proposals by integrating spatial, visual, and categorical features, and an object reasoning network that assigns probabilities to proposals for final referring object localization.
MB-ORES demonstrates superior performance on OPT-RSVG and DIOR-RSVG datasets, outperforming existing methods while maintaining classical OD capabilities.