menu
techminis

A naukri.com initiative

google-web-stories
Home

>

ML News

>

BemaGANv2:...
source image

Arxiv

3d

read

87

img
dot

Image Credit: Arxiv

BemaGANv2: A Tutorial and Comparative Survey of GAN-based Vocoders for Long-Term Audio Generation

  • This paper introduces BemaGANv2, an advanced GAN-based vocoder for high-fidelity and long-term audio generation.
  • BemaGANv2 builds upon the original BemaGAN architecture by incorporating architectural innovations like the Anti-aliased Multi-Periodicity composition (AMP) module in the generator.
  • The generator in BemaGANv2 uses the Snake activation function to better model periodic structures in audio.
  • BemaGANv2's discriminator framework includes the Multi-Envelope Discriminator (MED) to extract temporal envelope features and the Multi-Resolution Discriminator (MRD) to model long-range dependencies.
  • The evaluation of BemaGANv2 includes different discriminator configurations like MSD + MED, MSD + MRD, and MPD + MED + MRD using various objective metrics and subjective evaluations.
  • Objective metrics used for evaluation include FAD, SSIM, PLCC, and MCD, while subjective evaluations involve MOS and SMOS scores.
  • The paper provides a tutorial on model architecture, training methodology, and implementation details to ensure reproducibility.
  • The code and pre-trained models for BemaGANv2 are available at https://github.com/dinhoitt/BemaGANv2.

Read Full Article

like

5 Likes

For uninterrupted reading, download the app