A Robust Method for Speech Signal Time-Delay Estimation in Reverberant Rooms
ICASSP '97 Proceedings of the 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP '97) -Volume 1 - Volume 1
Robust speaker segmentation for meetings: the ICSI-SRI spring 2005 diarization system
MLMI'05 Proceedings of the Second international conference on Machine Learning for Multimodal Interaction
Further progress in meeting recognition: the ICSI-SRI spring 2005 speech-to-text evaluation system
MLMI'05 Proceedings of the Second international conference on Machine Learning for Multimodal Interaction
The 2004 ICSI-SRI-UW meeting recognition system
MLMI'04 Proceedings of the First international conference on Machine Learning for Multimodal Interaction
Speaker Diarization For Multiple-Distant-Microphone Meetings Using Several Sources of Information
IEEE Transactions on Computers
The LIA RT'07 Speaker Diarization System
Multimodal Technologies for Perception of Humans
A speaker diarization method based on the probabilistic fusion of audio-visual location information
Proceedings of the 2009 international conference on Multimodal interfaces
IEEE Transactions on Audio, Speech, and Language Processing
A review on speaker diarization systems and approaches
Speech Communication
Real-time audio-visual analysis for multiperson videoconferencing
Advances in Multimedia
Hi-index | 0.00 |
We present a method to extract speaker turn segmentation from multiple distant microphones (MDM) using only delay values found via a cross-correlation between the available channels. The method is robust against the number of speakers (which is unknown to the system), the number of channels, and the acoustics of the room. The delays between channels are processed and clustered to obtain a segmentation hypothesis. We have obtained a 31.2% diarization error rate (DER) for the NIST´s RT05s MDM conference room evaluation set. For a MDM subset of NIST´s RT04s development set, we have obtained 36.93% DER and 35.73% DER*. Comparing those results with the ones presented by Ellis and Liu [8], who also used between-channels differences for the same data, we have obtained 43% relative improvement in the error rate.