menu
techminis

A naukri.com initiative

google-web-stories
Home

>

ML News

>

EQA-RM: A ...
source image

Arxiv

2d

read

309

img
dot

Image Credit: Arxiv

EQA-RM: A Generative Embodied Reward Model with Test-time Scaling

  • Reward Models (RMs) are crucial for large model alignment but are underexplored in complex embodied tasks like Embodied Question Answering (EQA).
  • EQA-RM is a generative multimodal reward model tailored for EQA, trained using Contrastive Group Relative Policy Optimization (C-GRPO) to capture fine-grained behavioral distinctions.
  • EQA-RM offers structured reward feedback beyond simple scalars, enabling test-time scaling for dynamic evaluation granularity adjustment without retraining.
  • EQA-RewardBench is a new benchmark based on OpenEQA designed for assessing EQA reward models.
  • EQA-RM, fine-tuned on Qwen2-VL-2B-Instruct, achieves 61.9% accuracy on EQA-RM-Bench with high sample efficiency, outperforming various strong baselines and state-of-the-art models.
  • The code and dataset for EQA-RM can be accessed at https://github.com/UNITES-Lab/EQA-RM.

Read Full Article

like

18 Likes

For uninterrupted reading, download the app