menu
techminis

A naukri.com initiative

google-web-stories
Home

>

ML News

>

Meta-Adapt...
source image

Arxiv

2d

read

156

img
dot

Image Credit: Arxiv

Meta-Adaptive Prompt Distillation for Few-Shot Visual Question Answering

  • Large Multimodal Models (LMMs) often struggle with in-context learning (ICL) when performing new tasks with limited supervision.
  • In smaller LMMs, the ICL performance is inconsistent and does not always improve with more examples.
  • The inconsistency in ICL performance is attributed to LMMs being overwhelmed by unnecessary information in image embeddings.
  • A meta-learning approach is proposed to enable few-shot capabilities in LMMs by using fixed soft prompts distilled from task-relevant image features.
  • These prompts can be adapted at test time with just a few examples, addressing the issue of overwhelming information in image embeddings.
  • An attention-mapper module is introduced to aid in the prompt distillation, which can be integrated with the LLaVA v1.5 architecture.
  • The attention-mapper module is jointly learned with soft prompts, allowing for task adaptation in LMMs with minimal data using gradient steps.
  • Evaluation on the VL-ICL Bench demonstrates that the proposed method consistently outperforms ICL and related prompt-tuning approaches.
  • Even under image perturbations, the proposed method improves task induction and reasoning for visual question answering tasks.

Read Full Article

like

9 Likes

For uninterrupted reading, download the app