The TNO speaker diarization system for NIST RT05s meeting data

Authors:
David A. van Leeuwen
Affiliations:
TNO Human Factors, Soesterberg, The Netherlands
Venue:
MLMI'05 Proceedings of the Second international conference on Machine Learning for Multimodal Interaction
Year:
2005

Citing 4
Cited 3

DISTBIC: a speaker-based segmentation for audio data indexing

Speech Communication - Special issue on accessing information in spoken audio
The rich transcription 2005 spring meeting recognition evaluation

MLMI'05 Proceedings of the Second international conference on Machine Learning for Multimodal Interaction
Robust speaker segmentation for meetings: the ICSI-SRI spring 2005 diarization system

MLMI'05 Proceedings of the Second international conference on Machine Learning for Multimodal Interaction
NIST RT'05S evaluation: pre-processing techniques and speaker diarization on multiple microphone meetings

MLMI'05 Proceedings of the Second international conference on Machine Learning for Multimodal Interaction

Progress in the AMIDA Speaker Diarization System for Meeting Data

Multimodal Technologies for Perception of Humans
Robust speaker diarization for meetings: ICSI RT06S meetings evaluation system

MLMI'06 Proceedings of the Third international conference on Machine Learning for Multimodal Interaction
A review on speaker diarization systems and approaches

Speech Communication

Quantified Score

Hi-index	0.00

Visualization

Abstract

The TNO speaker speaker diarization system is based on a standard BIC segmentation and clustering algorithm. Since for the NIST Rich Transcription speaker dizarization evaluation measure correct speech detection appears to be essential, we have developed a speech activity detector (SAD) as well. This is based on decoding the speech signal using two Gaussian Mixture Models trained on silence and speech. The SAD was trained on only AMI development test data, and performed quite well in the evaluation on all 5 meeting locations, with a SAD error rate of 5.0 %. For the speaker clustering algorithm we optimized the BIC penalty parameter λ to 14, which is quite high with respect to the theoretical value of 1. The final speaker diarization error rate was evaluated at 35.1 %.