An automated framework WSW 2.0 has been introduced for analyzing vocal interactions in preschool classrooms.
The framework integrates wav2vec2-based speaker classification and Whisper speech transcription to enhance accuracy and scalability.
WSW 2.0 achieved a weighted F1 score of .845, accuracy of .846, and an error-corrected kappa of .672 for speaker classification.
The framework exhibited high absolute agreement intraclass correlations with expert transcriptions for various classroom language features, showing potential for revolutionizing educational research.