Introduction of an unsupervised framework for detecting audio patterns in musical samples (loops) through anomaly detection techniques to address challenges in music information retrieval.
Combination of deep feature extraction with unsupervised anomaly detection using a pre-trained Hierarchical Token-semantic Audio Transformer (HTS-AT) and Feature Fusion Mechanism (FFM).
Utilization of one-class Deep Support Vector Data Description (Deep SVDD) to learn normative audio patterns by mapping them to a compact latent hypersphere, showing improved anomaly separation in evaluations on curated bass and guitar datasets.
Research presents a fully unsupervised solution for processing diverse audio samples, enabling effective pattern identification through distance-based latent space scoring, overcoming previous limitations.