Multi-stage Speaker Diarization for Conference and Lecture Meetings

Authors:
X. Zhu;C. Barras;L. Lamel;J-L. Gauvain
Affiliations:
Spoken Language Processing Group, LIMSI-CNRS, Orsay cedex, France 91403 and Univ Paris-Sud, Orsay, France F-91405;Spoken Language Processing Group, LIMSI-CNRS, Orsay cedex, France 91403 and Univ Paris-Sud, Orsay, France F-91405;Spoken Language Processing Group, LIMSI-CNRS, Orsay cedex, France 91403;Spoken Language Processing Group, LIMSI-CNRS, Orsay cedex, France 91403
Venue:
Multimodal Technologies for Perception of Humans
Year:
2008

Citing 1
Cited 4

Multistage speaker diarization of broadcast news

IEEE Transactions on Audio, Speech, and Language Processing

Tuning-robust initialization methods for speaker diarization

IEEE Transactions on Audio, Speech, and Language Processing
Speaker diarization using low-cost wearable wireless sensors

Proceedings of the 3rd International Conference on Information and Communication Systems
A review on speaker diarization systems and approaches

Speech Communication
Audiovisual diarization of people in video content

Multimedia Tools and Applications

Quantified Score

Hi-index	0.00

Visualization

Abstract

The LIMSI RT-07S speaker diarization system for the conference and lecture meetings is presented in this paper. This system builds upon the RT-06S diarization system designed for lecture data. The baseline system combines agglomerative clustering based on Bayesian information criterion (BIC) with a second clustering using state-of-the-art speaker identification (SID) techniques. Since the baseline system provides a high speech activity detection (SAD) error around of 10% on lecture data, some different acoustic representations with various normalization techniques are investigated within the framework of log-likelihood ratio (LLR) based speech activity detector. UBMs trained on the different types of acoustic features are also examined in the SID clustering stage. All SAD acoustic models and UBMs are trained with the forced alignment segmentations of the conference data. The diarization system integrating the new SAD models and UBM gives comparable results on both the RT-07S conference and lecture evaluation data for the multiple distant microphone (MDM) condition.