Speaker Diarization Using Direction of Arrival Estimate and Acoustic Feature Information: The I2R-NTU Submission for the NIST RT 2007 Evaluation

Authors:
Eugene Chin Koh;Hanwu Sun;Tin Lay Nwe;Trung Hieu Nguyen;Bin Ma;Eng-Siong Chng;Haizhou Li;Susanto Rahardja
Affiliations:
School of Computer Engineering, Nanyang Technological University (NTU), Singapore 639798 and Human Language Technology Department, Institute for Infocomm Research (I2R), Singapore 119613;Human Language Technology Department, Institute for Infocomm Research (I2R), Singapore 119613;Human Language Technology Department, Institute for Infocomm Research (I2R), Singapore 119613;School of Computer Engineering, Nanyang Technological University (NTU), Singapore 639798;Human Language Technology Department, Institute for Infocomm Research (I2R), Singapore 119613;School of Computer Engineering, Nanyang Technological University (NTU), Singapore 639798;Human Language Technology Department, Institute for Infocomm Research (I2R), Singapore 119613;Human Language Technology Department, Institute for Infocomm Research (I2R), Singapore 119613
Venue:
Multimodal Technologies for Perception of Humans
Year:
2008

Citing 4
Cited 2

A Robust Method for Speech Signal Time-Delay Estimation in Reverberant Rooms

ICASSP '97 Proceedings of the 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP '97) -Volume 1 - Volume 1
Robust speaker segmentation for meetings: the ICSI-SRI spring 2005 diarization system

MLMI'05 Proceedings of the Second international conference on Machine Learning for Multimodal Interaction
NIST RT'05S evaluation: pre-processing techniques and speaker diarization on multiple microphone meetings

MLMI'05 Proceedings of the Second international conference on Machine Learning for Multimodal Interaction
The AMI speaker diarization system for NIST RT06s meeting data

MLMI'06 Proceedings of the Third international conference on Machine Learning for Multimodal Interaction

Tuning-robust initialization methods for speaker diarization

IEEE Transactions on Audio, Speech, and Language Processing
A new approach of speaker clustering based on the stereophonic differential energy

International Journal of Speech Technology

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper describes the I2R/NTU system submitted for the NIST Rich Transcription 2007 (RT-07) Meeting Recognition evaluation Multiple Distant Microphone (MDM) task. In our system, speaker turn detection and clustering is done using Direction of Arrival (DOA) information. Purification of the resultant speaker clusters is then done by performing GMM modeling on acoustic features. As a final step, non-speech & silence removal is done. Our system achieved a competitive overall DER of 15.32% for the NIST Rich Transcription 2007 evaluation task.