Speaker Diarization Using Direction of Arrival Estimate and Acoustic Feature Information: The I2R-NTU Submission for the NIST RT 2007 Evaluation

  • Authors:
  • Eugene Chin Koh;Hanwu Sun;Tin Lay Nwe;Trung Hieu Nguyen;Bin Ma;Eng-Siong Chng;Haizhou Li;Susanto Rahardja

  • Affiliations:
  • School of Computer Engineering, Nanyang Technological University (NTU), Singapore 639798 and Human Language Technology Department, Institute for Infocomm Research (I2R), Singapore 119613;Human Language Technology Department, Institute for Infocomm Research (I2R), Singapore 119613;Human Language Technology Department, Institute for Infocomm Research (I2R), Singapore 119613;School of Computer Engineering, Nanyang Technological University (NTU), Singapore 639798;Human Language Technology Department, Institute for Infocomm Research (I2R), Singapore 119613;School of Computer Engineering, Nanyang Technological University (NTU), Singapore 639798;Human Language Technology Department, Institute for Infocomm Research (I2R), Singapore 119613;Human Language Technology Department, Institute for Infocomm Research (I2R), Singapore 119613

  • Venue:
  • Multimodal Technologies for Perception of Humans
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper describes the I2R/NTU system submitted for the NIST Rich Transcription 2007 (RT-07) Meeting Recognition evaluation Multiple Distant Microphone (MDM) task. In our system, speaker turn detection and clustering is done using Direction of Arrival (DOA) information. Purification of the resultant speaker clusters is then done by performing GMM modeling on acoustic features. As a final step, non-speech & silence removal is done. Our system achieved a competitive overall DER of 15.32% for the NIST Rich Transcription 2007 evaluation task.