menu
techminis

A naukri.com initiative

google-web-stories
Home

>

ML News

>

Annotation...
source image

Arxiv

3d

read

59

img
dot

Image Credit: Arxiv

Annotation-Free MIDI-to-Audio Synthesis via Concatenative Synthesis and Generative Refinement

  • Recent MIDI-to-audio synthesis methods using deep neural networks have been successful in generating high-quality, expressive instrumental tracks.
  • These methods usually require MIDI annotations for supervised training, which limits the diversity of instrument timbres and expression styles in the output.
  • CoSaRef is introduced as a MIDI-to-audio synthesis method that does not depend on MIDI-audio paired datasets.
  • CoSaRef involves two main steps: generating a synthetic audio track using concatenative synthesis from MIDI input and refining it using a diffusion-based deep generative model trained without MIDI annotations.
  • This method enhances the diversity of timbres and expression styles in the generated audio output.
  • CoSaRef also enables fine control over timbres and expression through sample selection and extra MIDI design, akin to traditional functions in digital audio workstations.
  • Experiments demonstrated that CoSaRef can produce realistic tracks while maintaining detailed timbre control via one-shot samples.
  • Despite not being trained with MIDI annotations, CoSaRef outperformed a state-of-the-art timbre-controllable method based on MIDI supervision in both objective and subjective evaluations.

Read Full Article

like

3 Likes

For uninterrupted reading, download the app