NIST RT'05S evaluation: pre-processing techniques and speaker diarization on multiple microphone meetings

  • Authors:
  • Dan Istrate;Corinne Fredouille;Sylvain Meignier;Laurent Besacier;Jean François Bonastre

  • Affiliations:
  • LIA-Avignon, Avignon, France;LIA-Avignon, Avignon, France;LIUM, Le Mans;CLIPS-IMAG (UJF & CNRS & INPG), Grenoble, France;LIA-Avignon, Avignon, France

  • Venue:
  • MLMI'05 Proceedings of the Second international conference on Machine Learning for Multimodal Interaction
  • Year:
  • 2005

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper presents different pre-processing techniques, coupled with three speaker diarization systems in the framework of the NIST 2005 Spring Rich Transcription campaign (RT'05S). The pre-processing techniques aim at providing a signal quality index in order to build a unique “virtual” signal obtained from all the microphone recordings available for a meeting. This unique virtual signal relies on a weighted sum of the different microphone signals while the signal quality index is given according to a signal to noise ratio. Two methods are used in this paper to compute the instantaneous signal to noise ratio: a speech activity detection based approach and a noise spectrum estimate. The speaker diarization task is performed using systems developed by different labs: the LIA, LIUM and CLIPS. Among the different system submissions made by these three labs, the best system obtained 24.5 % speaker diarization error for the conference subdomain and 18.4 % for the lecture subdomain.