This project focused on developing a deep learning system to classify gender from voice samples using Convolutional Neural Networks (CNN) and mel spectrograms.
By interpreting mel spectrograms, the CNN was able to identify differences in how male and female voices behave in frequency and time.
The project aimed to construct a robust gender classification model through data preparation, feature extraction, CNN training, and performance evaluation.
Challenges were encountered when working with real-world audio data, emphasizing the complexities of deep learning models.
Two datasets were utilized, and various audio augmentations were applied during training for improved model generalization.
Principal Component Analysis (PCA) was considered but found to be unsuitable for audio classification tasks using CNNs due to its limitations.
CNNs trained on spectrograms learn task-specific features focusing on time and frequency relationships critical in speech data analysis.
Spectrograms offer visual interpretability compared to abstract PCA components, aiding in understanding pitch, formants, and energy in the signal.
Instead of PCA, the CNN directly learned from high-resolution spectrograms, while applying regularization techniques to mitigate overfitting.
This study highlighted that gender classification from voice involves nuanced patterns beyond pitch, effectively tackled by modern deep learning techniques.
The system achieved over 93% accuracy and demonstrated reliable performance on real-world audio data, offering potential for further exploration in voice analysis.