Speaker diarization: from broadcast news to lectures

  • Authors:
  • Xuan Zhu;Claude Barras;Lori Lamel;Jean-Luc Gauvain

  • Affiliations:
  • LIMSI-CNRS, France;LIMSI-CNRS, France;LIMSI-CNRS, France;LIMSI-CNRS, France

  • Venue:
  • MLMI'06 Proceedings of the Third international conference on Machine Learning for Multimodal Interaction
  • Year:
  • 2006

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper presents the LIMSI speaker diarization system for lecture data, in the framework of the Rich Transcription 2006 Spring (RT-06S) meeting recognition evaluation. This system builds upon the baseline diarization system designed for broadcast news data. The baseline system combines agglomerative clustering based on Bayesian information criterion with a second clustering using state-of-the-art speaker identification techniques. In the RT-04F evaluation, the baseline system provided an overall diarization error of 8.5% on broadcast news data. However since it has a high missed speech error rate on lecture data, a different speech activity detection approach based on the log-likelihood ratio between the speech and non-speech models trained on the seminar data was explored. The new speaker diarization system integrating this module provides an overall diarization error of 20.2% on the RT-06S Multiple Distant Microphone (MDM) data.