Scientists utilize automated systems like autonomous recording units (ARUs) to record audio in forest and jungle areas for studying ecosystems and identifying different species of animals and insects.
Google Research notes the significance of bird vocalizations in understanding food systems and forest health, emphasizing the value of audio-based identification for birds.
The BirdCLEF+ 2025 competition on Kaggle requires participants to design a classification model to predict bird species from audio recordings, leading to the need for a custom model due to species outside of existing classifiers.
This guide outlines the creation of a bird vocalization classifier using techniques similar to Google Research's approach, leveraging the BirdCLEF+ 2025 competition dataset for training.
The training dataset includes 28,564 audio recordings in the train_audio directory, representing various bird species, with taxonomy details provided in the taxonomy.csv file.
The competition dataset involves 206 bird species, with 63 classes not covered by the GBV classifier, leading to imbalance and quality issues in some classes.
Training audio recordings often contain human speech annotations, and tactics to address class imbalance and annotation challenges are discussed in the classifier building section.
The classifier design involves splitting audio, converting to mel spectrograms, and training an EfficientNet B0 model, leveraging pre-trained models from Google Research.
To address data imbalance and human annotations, the approach includes pseudo-labeling soundscapes data, augmenting minority classes, and generating mel spectrograms for model input.
Training results show overfitting with accuracy above 90% but fluctuating validation accuracy, indicating room for improvement in generalization.