menu
techminis

A naukri.com initiative

google-web-stories
Home

>

ML News

>

$C^2$AV-TS...
source image

Arxiv

1d

read

286

img
dot

Image Credit: Arxiv

$C^2$AV-TSE: Context and Confidence-aware Audio Visual Target Speaker Extraction

  • Audio-Visual Target Speaker Extraction (AV-TSE) aims to enhance auditory perception using visual cues.
  • A model-agnostic strategy called Mask-And-Recover (MAR) is proposed to improve extraction quality by integrating contextual correlations.
  • The Fine-grained Confidence Score (FCS) model is introduced to assess extraction quality and guide improvement on low-quality segments.
  • The proposed model-agnostic training paradigm demonstrated consistent performance improvements across various metrics on the VoxCeleb2 dataset.

Read Full Article

like

17 Likes

For uninterrupted reading, download the app