<ul data-eligibleForWebStory="true"><li>Speech generation technology advancements raise concerns about potential misuse of synthetic speech signals.</li><li>The study addresses three key tasks: single-model attribution in an open-world scenario, model attribution in a closed-world scenario, and distinguishing synthetic from real speech.</li><li>The research uses standardized average residuals between audio signals and filtered versions as vocoder fingerprints for identification purposes.</li><li>The vocoder fingerprints prove to be effective in achieving over 99% average AUROC on LJSpeech and JSUT datasets for various tasks.</li><li>The study also demonstrates resilience to noise to a certain extent, as shown in the accompanying robustness study.</li></ul>

Model Attribution and Detection of Synthetic Speech via Vocoder Fingerprints

Discover more