Progress in the AMIDA Speaker Diarization System for Meeting Data

Authors:
David A. Leeuwen;Matej Konečný
Affiliations:
TNO Human Factors, Soesterberg, The Netherlands 3769 ZG;TNO Human Factors, Soesterberg, The Netherlands 3769 ZG
Venue:
Multimodal Technologies for Perception of Humans
Year:
2008

Citing 8
Cited 3

An Introduction to Application-Independent Evaluation of Speaker Recognition Systems

Speaker Classification I
The rich transcription 2005 spring meeting recognition evaluation

MLMI'05 Proceedings of the Second international conference on Machine Learning for Multimodal Interaction
Robust speaker segmentation for meetings: the ICSI-SRI spring 2005 diarization system

MLMI'05 Proceedings of the Second international conference on Machine Learning for Multimodal Interaction
The TNO speaker diarization system for NIST RT05s meeting data

MLMI'05 Proceedings of the Second international conference on Machine Learning for Multimodal Interaction
Technical improvements of the E-HMM based speaker diarization system for meeting records

MLMI'06 Proceedings of the Third international conference on Machine Learning for Multimodal Interaction
The AMI speaker diarization system for NIST RT06s meeting data

MLMI'06 Proceedings of the Third international conference on Machine Learning for Multimodal Interaction
Speaker diarization: from broadcast news to lectures

MLMI'06 Proceedings of the Third international conference on Machine Learning for Multimodal Interaction
Multistage speaker diarization of broadcast news

IEEE Transactions on Audio, Speech, and Language Processing

Tuning-robust initialization methods for speaker diarization

IEEE Transactions on Audio, Speech, and Language Processing
Multistream speaker diarization of meetings recordings beyond MFCC and TDOA features

Speech Communication
A review on speaker diarization systems and approaches

Speech Communication

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper we describe the AMIDA speaker dizarization system as it was submitted to the NIST Rich Transcription evaluation 2007 for conference room data. This is done in the context of the history of this system and other speaker diarization systems. One of the goals of our system is to have as little tunable parameters as possible, while maintaining performance. The system consists of a BIC segmentation/clustering initialization, followed by a combined re-segmentation cluster merging algorithm. The Diarization Error Rate (DER) result of our best system is 17.0 %, accounting for overlapping speech. However, we find that a slight altering of Speech Activity Detection models has a large impact on the speaker DER.