Robust speaker diarization for meetings: ICSI RT06S meetings evaluation system

Authors:
Xavier Anguera;Chuck Wooters;Jose M. Pardo
Affiliations:
International Computer Science Institute, Berkeley, CA;International Computer Science Institute, Berkeley, CA;International Computer Science Institute, Berkeley, CA
Venue:
MLMI'06 Proceedings of the Third international conference on Machine Learning for Multimodal Interaction
Year:
2006

Citing 5
Cited 6

Robust speaker segmentation for meetings: the ICSI-SRI spring 2005 diarization system

MLMI'05 Proceedings of the Second international conference on Machine Learning for Multimodal Interaction
NIST RT'05S evaluation: pre-processing techniques and speaker diarization on multiple microphone meetings

MLMI'05 Proceedings of the Second international conference on Machine Learning for Multimodal Interaction
The TNO speaker diarization system for NIST RT05s meeting data

MLMI'05 Proceedings of the Second international conference on Machine Learning for Multimodal Interaction
Further progress in meeting recognition: the ICSI-SRI spring 2005 speech-to-text evaluation system

MLMI'05 Proceedings of the Second international conference on Machine Learning for Multimodal Interaction
Automatic cluster complexity and quantity selection: towards robust speaker diarization

MLMI'06 Proceedings of the Third international conference on Machine Learning for Multimodal Interaction

Speaker Diarization For Multiple-Distant-Microphone Meetings Using Several Sources of Information

IEEE Transactions on Computers
The SRI-ICSI Spring 2007 Meeting and Lecture Recognition System

Multimodal Technologies for Perception of Humans
The IBM RT07 Evaluation Systems for Speaker Diarization on Lecture Meetings

Multimodal Technologies for Perception of Humans
The ICSI RT07s Speaker Diarization System

Multimodal Technologies for Perception of Humans
Annotation of heterogeneous multimedia content using automatic speech recognition

SAMT'07 Proceedings of the semantic and digital media technologies 2nd international conference on Semantic Multimedia
The ICSI-SRI spring 2006 meeting recognition system

MLMI'06 Proceedings of the Third international conference on Machine Learning for Multimodal Interaction

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper we present the ICSI speaker diarization system submitted for the NIST Rich Transcription evaluation (RT06s) [1] conducted on the meetings environment. The presented system is based on the RT05s system, which uses agglomerative clustering with a modified Bayesian Information Criterion (BIC) measure to decide which pairs of clusters to merge and to determine when to stop merging clusters. In this year's system we have eliminated any remaining need for training data, therefore increasing robustness. In our primary system we have introduced several improvements from last year. First, we use a new training-free speech/non-speech detection algorithm. Second, we introduce a new algorithm for system initialization. The third improvement is the use of a frame purification algorithm to increase cluster discriminability. Finally, we describe the use of inter-channel delays as features. We explain each of these improvements and show our system's results on the official evaluation data using hand-aligned references and forced-alignments. We also analyze some of the results and propose improvements.