menu
techminis

A naukri.com initiative

google-web-stories
Home

>

ML News

>

AI learns ...
source image

Mit

1d

read

345

img
dot

Image Credit: Mit

AI learns how vision and sound are connected, without human intervention

  • Researchers have developed an AI model that learns to connect vision and sound without human intervention, mimicking how humans naturally learn.
  • This approach could have applications in journalism, film production, and improving a robot's understanding of real-world environments.
  • The AI model was trained to align audio and visual data from video clips without human labels, improving performance in video retrieval tasks and scene classification.
  • The method developed by MIT researchers helps the model learn a finer-grained correspondence between video frames and accompanying audio.
  • Architectural tweaks were made to balance learning objectives and enhance system performance in processing audiovisual information.
  • The model, named CAV-MAE Sync, splits audio into smaller windows to improve learning of finer-grained correspondence.
  • By introducing separate data representations for contrastive and reconstructive learning objectives, the model's performance was boosted.
  • The enhancements led to improved video retrieval accuracy and scene classification in audiovisual scenarios.
  • The researchers aim to integrate new models for better data representation and include text data to enhance the system's capabilities further.
  • Funding for this work is provided by the German Federal Ministry of Education and Research and the MIT-IBM Watson AI Lab.

Read Full Article

like

20 Likes

For uninterrupted reading, download the app