Multi-stage Speaker Diarization for Conference and Lecture Meetings

  • Authors:
  • X. Zhu;C. Barras;L. Lamel;J-L. Gauvain

  • Affiliations:
  • Spoken Language Processing Group, LIMSI-CNRS, Orsay cedex, France 91403 and Univ Paris-Sud, Orsay, France F-91405;Spoken Language Processing Group, LIMSI-CNRS, Orsay cedex, France 91403 and Univ Paris-Sud, Orsay, France F-91405;Spoken Language Processing Group, LIMSI-CNRS, Orsay cedex, France 91403;Spoken Language Processing Group, LIMSI-CNRS, Orsay cedex, France 91403

  • Venue:
  • Multimodal Technologies for Perception of Humans
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

The LIMSI RT-07S speaker diarization system for the conference and lecture meetings is presented in this paper. This system builds upon the RT-06S diarization system designed for lecture data. The baseline system combines agglomerative clustering based on Bayesian information criterion (BIC) with a second clustering using state-of-the-art speaker identification (SID) techniques. Since the baseline system provides a high speech activity detection (SAD) error around of 10% on lecture data, some different acoustic representations with various normalization techniques are investigated within the framework of log-likelihood ratio (LLR) based speech activity detector. UBMs trained on the different types of acoustic features are also examined in the SID clustering stage. All SAD acoustic models and UBMs are trained with the forced alignment segmentations of the conference data. The diarization system integrating the new SAD models and UBM gives comparable results on both the RT-07S conference and lecture evaluation data for the multiple distant microphone (MDM) condition.