NIST RT'05S evaluation: pre-processing techniques and speaker diarization on multiple microphone meetings

Authors:
Dan Istrate;Corinne Fredouille;Sylvain Meignier;Laurent Besacier;Jean François Bonastre
Affiliations:
LIA-Avignon, Avignon, France;LIA-Avignon, Avignon, France;LIUM, Le Mans;CLIPS-IMAG (UJF & CNRS & INPG), Grenoble, France;LIA-Avignon, Avignon, France
Venue:
MLMI'05 Proceedings of the Second international conference on Machine Learning for Multimodal Interaction
Year:
2005

Citing 3
Cited 8

DISTBIC: a speaker-based segmentation for audio data indexing

Speech Communication - Special issue on accessing information in spoken audio
Audio Partitioning and Transcription for Broadcast Data Indexation

Multimedia Tools and Applications
Evolutive HMM for multi-speaker tracking system

ICASSP '00 Proceedings of the Acoustics, Speech, and Signal Processing, 2000. on IEEE International Conference - Volume 02

Speaker Diarization Using Direction of Arrival Estimate and Acoustic Feature Information: The I2R-NTU Submission for the NIST RT 2007 Evaluation

Multimodal Technologies for Perception of Humans
The LIA RT'07 Speaker Diarization System

Multimodal Technologies for Perception of Humans
Robust speech/non-speech classification in heterogeneous multimedia content

Speech Communication
The TNO speaker diarization system for NIST RT05s meeting data

MLMI'05 Proceedings of the Second international conference on Machine Learning for Multimodal Interaction
Robust speaker diarization for meetings: ICSI RT06S meetings evaluation system

MLMI'06 Proceedings of the Third international conference on Machine Learning for Multimodal Interaction
Technical improvements of the E-HMM based speaker diarization system for meeting records

MLMI'06 Proceedings of the Third international conference on Machine Learning for Multimodal Interaction
Speaker diarization: from broadcast news to lectures

MLMI'06 Proceedings of the Third international conference on Machine Learning for Multimodal Interaction
A review on speaker diarization systems and approaches

Speech Communication

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper presents different pre-processing techniques, coupled with three speaker diarization systems in the framework of the NIST 2005 Spring Rich Transcription campaign (RT'05S). The pre-processing techniques aim at providing a signal quality index in order to build a unique “virtual” signal obtained from all the microphone recordings available for a meeting. This unique virtual signal relies on a weighted sum of the different microphone signals while the signal quality index is given according to a signal to noise ratio. Two methods are used in this paper to compute the instantaneous signal to noise ratio: a speech activity detection based approach and a noise spectrum estimate. The speaker diarization task is performed using systems developed by different labs: the LIA, LIUM and CLIPS. Among the different system submissions made by these three labs, the best system obtained 24.5 % speaker diarization error for the conference subdomain and 18.4 % for the lecture subdomain.