Speaker diarization in meeting audio

Authors:
Tin Lay Nwe;Hanwu Sun;Haizhou Li;Susanto Rahardja
Affiliations:
Institute for Infocomm Research (I2R), A*STAR, 1 Fusionopolis Way, Singapore 138632;Institute for Infocomm Research (I2R), A*STAR, 1 Fusionopolis Way, Singapore 138632;Institute for Infocomm Research (I2R), A*STAR, 1 Fusionopolis Way, Singapore 138632;Institute for Infocomm Research (I2R), A*STAR, 1 Fusionopolis Way, Singapore 138632
Venue:
ICASSP '09 Proceedings of the 2009 IEEE International Conference on Acoustics, Speech and Signal Processing
Year:
2009

Citing 0
Cited 1

A review on speaker diarization systems and approaches

Speech Communication

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper describes speaker diarization system on a NIST Rich Transcription 2007 (RT-07) Meeting Recognition evaluation data set for the task of Multiple Distant Microphone (MDM). Our implementation includes three components: initial clustering, non-speech removal and cluster purification. Initial clusters are generated using Directional of Arrival (DOA) information and bootstrap clustering. Multiple GMM modeling for speech/non-speech classification is employed for non-speech removal component. In addition, a novel system fusion strategy using information from Receiver Operating Curve (ROC) is proposed for non-speech removal component. Finally, consensus clustering approach together with iterative GMM clustering method is employed for speaker cluster purification. The system achieves the overall DER of 10.81%.