menu
techminis

A naukri.com initiative

google-web-stories
Home

>

ML News

>

Athena: En...
source image

Arxiv

2d

read

81

img
dot

Image Credit: Arxiv

Athena: Enhancing Multimodal Reasoning with Data-efficient Process Reward Models

  • Researchers introduce Athena-PRM, a multimodal process reward model for evaluating reward scores in complex reasoning problems efficiently.
  • Conventional methods for creating high-performance PRMs require time-consuming step-level annotations, leading to financial investments.
  • Athena-PRM leverages prediction consistency between weak and strong completers to generate high-quality process-labeled data effectively.
  • With just 5,000 samples, Athena-PRM shows remarkable effectiveness across different scenarios and benchmarks.
  • Two strategies, ORM initialization and up-sampling for negative data, are developed to boost PRM performance.
  • The approach is validated in verification, direct evaluation of reasoning step correctness, and reward ranked fine-tuning scenarios.
  • Athena-PRM consistently achieves superior performance across various benchmarks, enhancing performance by 10.2 points on WeMath and 7.1 points on MathVista for test time scaling.
  • It sets the state-of-the-art results in VisualProcessBench and outperforms the previous SoTA by 3.9 F1-score, demonstrating accurate reasoning step assessment.
  • Athena-7B, developed using Athena-PRM as the reward model, surpasses baseline performance significantly on five benchmarks.

Read Full Article

like

4 Likes

For uninterrupted reading, download the app