The AMI speaker diarization system for NIST RT06s meeting data

Authors:
David A. van Leeuwen;Marijn Huijbregts
Affiliations:
TNO Human Factors, Soesterberg, The Netherlands;Department of EEMCS, Human Media Interaction, University of Twente, Enschede, The Netherlands
Venue:
MLMI'06 Proceedings of the Third international conference on Machine Learning for Multimodal Interaction
Year:
2006

Citing 4
Cited 8

Robust speaker segmentation for meetings: the ICSI-SRI spring 2005 diarization system

MLMI'05 Proceedings of the Second international conference on Machine Learning for Multimodal Interaction
Robust speech activity detection in interactive smart-room environments

MLMI'06 Proceedings of the Third international conference on Machine Learning for Multimodal Interaction
The rich transcription 2006 spring meeting recognition evaluation

MLMI'06 Proceedings of the Third international conference on Machine Learning for Multimodal Interaction
The AMI meeting transcription system: progress and performance

MLMI'06 Proceedings of the Third international conference on Machine Learning for Multimodal Interaction

The 2007 AMI(DA) System for Meeting Transcription

Multimodal Technologies for Perception of Humans
Progress in the AMIDA Speaker Diarization System for Meeting Data

Multimodal Technologies for Perception of Humans
Speaker Diarization Using Direction of Arrival Estimate and Acoustic Feature Information: The I2R-NTU Submission for the NIST RT 2007 Evaluation

Multimodal Technologies for Perception of Humans
The IBM RT07 Evaluation Systems for Speaker Diarization on Lecture Meetings

Multimodal Technologies for Perception of Humans
Annotation of heterogeneous multimedia content using automatic speech recognition

SAMT'07 Proceedings of the semantic and digital media technologies 2nd international conference on Semantic Multimedia
Spoken term detection system based on combination of LVCSR and phonetic search

MLMI'07 Proceedings of the 4th international conference on Machine learning for multimodal interaction
Towards automatic speaker retrieval for large multimedia archives

Proceedings of the 3rd international workshop on Automated information extraction in media production
A system for the semantic multimodal analysis of news audio-visual content

EURASIP Journal on Advances in Signal Processing

Quantified Score

Hi-index	0.00

Visualization

Abstract

We describe the systems submitted to the NIST RT06s evaluation for the Speech Activity Detection (SAD) and Speaker Diarization (SPKR) tasks. For speech activity detection, a new analysis methodology is presented that generalizes the Detection Erorr Tradeoff analysis commonly used in speaker detection tasks. The speaker diarization systems are based on the TNO and ICSI system submitted for RT05s. For the conference room evaluation Single Distant Microphone condition, the SAD results perform well at 4.23 % error rate, and the ‘HMM-BIC' SPKR results perform competatively at an error rate of 37.2 % including overlapping speech.