Speaker diarization in meeting audio

  • Authors:
  • Tin Lay Nwe;Hanwu Sun;Haizhou Li;Susanto Rahardja

  • Affiliations:
  • Institute for Infocomm Research (I2R), A*STAR, 1 Fusionopolis Way, Singapore 138632;Institute for Infocomm Research (I2R), A*STAR, 1 Fusionopolis Way, Singapore 138632;Institute for Infocomm Research (I2R), A*STAR, 1 Fusionopolis Way, Singapore 138632;Institute for Infocomm Research (I2R), A*STAR, 1 Fusionopolis Way, Singapore 138632

  • Venue:
  • ICASSP '09 Proceedings of the 2009 IEEE International Conference on Acoustics, Speech and Signal Processing
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper describes speaker diarization system on a NIST Rich Transcription 2007 (RT-07) Meeting Recognition evaluation data set for the task of Multiple Distant Microphone (MDM). Our implementation includes three components: initial clustering, non-speech removal and cluster purification. Initial clusters are generated using Directional of Arrival (DOA) information and bootstrap clustering. Multiple GMM modeling for speech/non-speech classification is employed for non-speech removal component. In addition, a novel system fusion strategy using information from Receiver Operating Curve (ROC) is proposed for non-speech removal component. Finally, consensus clustering approach together with iterative GMM clustering method is employed for speaker cluster purification. The system achieves the overall DER of 10.81%.