menu
techminis

A naukri.com initiative

google-web-stories
Home

>

Robotics News

>

Rethinking...
source image

Unite

4w

read

168

img
dot

Image Credit: Unite

Rethinking Video AI Training with User-Focused Data

  • A new paper discusses the limitations of popular foundation models in rendering accurate user-requested content.
  • Examples showcase difficulties faced by OpenAI's Sora model and Adobe Firefly generative diffusion engine in accurately depicting certain concepts.
  • Researchers introduce a new dataset methodology named VideoUFO, aiming to align data collections better with user expectations.
  • VideoUFO dataset includes 1.9 million videos on user-focused topics, distinct from popular existing datasets with only 0.29% overlap.
  • The approach involves filtering YouTube videos with Creative Commons licenses based on pre-estimated user needs, ensuring novel content selection.
  • Emphasis is placed on data curation around user demand to counter the biased distribution of internet content in generative video systems.
  • Researchers employ a methodology of topic analysis using SentenceTransformers, K-means clustering, and leveraging GPT-4o for refining dataset topics.
  • Videos are scraped based on topic criteria, with each entry featuring brief and detailed captions; video quality assessment is conducted with VBench project methods.
  • The paper evaluates generative models' performance with BenchUFO benchmark, highlighting varied success rates on user-focused topics across different architectures.
  • Current text-to-video models exhibit inconsistencies in performing well on user-focused topics like 'giant squid' or 'Van Gogh' due to insufficient training.

Read Full Article

like

10 Likes

For uninterrupted reading, download the app