An Introduction to Application-Independent Evaluation of Speaker Recognition Systems
Speaker Classification I
The rich transcription 2005 spring meeting recognition evaluation
MLMI'05 Proceedings of the Second international conference on Machine Learning for Multimodal Interaction
Robust speaker segmentation for meetings: the ICSI-SRI spring 2005 diarization system
MLMI'05 Proceedings of the Second international conference on Machine Learning for Multimodal Interaction
The TNO speaker diarization system for NIST RT05s meeting data
MLMI'05 Proceedings of the Second international conference on Machine Learning for Multimodal Interaction
Technical improvements of the E-HMM based speaker diarization system for meeting records
MLMI'06 Proceedings of the Third international conference on Machine Learning for Multimodal Interaction
The AMI speaker diarization system for NIST RT06s meeting data
MLMI'06 Proceedings of the Third international conference on Machine Learning for Multimodal Interaction
Speaker diarization: from broadcast news to lectures
MLMI'06 Proceedings of the Third international conference on Machine Learning for Multimodal Interaction
Multistage speaker diarization of broadcast news
IEEE Transactions on Audio, Speech, and Language Processing
Tuning-robust initialization methods for speaker diarization
IEEE Transactions on Audio, Speech, and Language Processing
Multistream speaker diarization of meetings recordings beyond MFCC and TDOA features
Speech Communication
A review on speaker diarization systems and approaches
Speech Communication
Hi-index | 0.00 |
In this paper we describe the AMIDA speaker dizarization system as it was submitted to the NIST Rich Transcription evaluation 2007 for conference room data. This is done in the context of the history of this system and other speaker diarization systems. One of the goals of our system is to have as little tunable parameters as possible, while maintaining performance. The system consists of a BIC segmentation/clustering initialization, followed by a combined re-segmentation cluster merging algorithm. The Diarization Error Rate (DER) result of our best system is 17.0 %, accounting for overlapping speech. However, we find that a slight altering of Speech Activity Detection models has a large impact on the speaker DER.