Researchers introduce SCENIR, a novel unsupervised scene graph-based retrieval framework emphasizing semantic content over low-level visual features.
SCENIR utilizes a Graph Autoencoder-based approach to eliminate the need for labeled training data, achieving superior performance and runtime efficiency compared to existing models.
The framework leverages Graph Edit Distance (GED) as a more reliable measure for scene graph similarity, replacing inconsistent caption-based supervision in image-to-image retrieval evaluation.
SCENIR demonstrates generalizability by applying it to unannotated datasets through automated scene graph generation and contributes to advancing state-of-the-art in counterfactual image retrieval.