The rich transcription 2005 spring meeting recognition evaluation

Authors:
Jonathan G. Fiscus;Nicolas Radde;John S. Garofolo;Audrey Le;Jerome Ajot;Christophe Laprun
Affiliations:
National Institute of Standards and Technology, Gaithersburg, MD;National Institute of Standards and Technology, Gaithersburg, MD;National Institute of Standards and Technology, Gaithersburg, MD;National Institute of Standards and Technology, Gaithersburg, MD;National Institute of Standards and Technology, Gaithersburg, MD;National Institute of Standards and Technology, Gaithersburg, MD
Venue:
MLMI'05 Proceedings of the Second international conference on Machine Learning for Multimodal Interaction
Year:
2005

Citing 0
Cited 9

Evaluations of Automatic Speaker Classification Systems

Speaker Classification I
The Rich Transcription 2007 Meeting Recognition Evaluation

Multimodal Technologies for Perception of Humans
Progress in the AMIDA Speaker Diarization System for Meeting Data

Multimodal Technologies for Perception of Humans
Evaluating the future of HCI: challenges for the evaluation of emerging applications

ICMI'06/IJCAI'07 Proceedings of the ICMI 2006 and IJCAI 2007 international conference on Artifical intelligence for human computing
The TNO speaker diarization system for NIST RT05s meeting data

MLMI'05 Proceedings of the Second international conference on Machine Learning for Multimodal Interaction
Speaker identification and speech recognition using phased arrays

Ambient Intelligence in Everyday Life
Robust speech activity detection in interactive smart-room environments

MLMI'06 Proceedings of the Third international conference on Machine Learning for Multimodal Interaction
The rich transcription 2006 spring meeting recognition evaluation

MLMI'06 Proceedings of the Third international conference on Machine Learning for Multimodal Interaction
A review on speaker diarization systems and approaches

Speech Communication

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper presents the design and results of the Rich Transcription Spring 2005 (RT-05S) Meeting Recognition Evaluation. This evaluation is the third in a series of community-wide evaluations of language technologies in the meeting domain. For 2005, four evaluation tasks were supported. These included a speech-to-text (STT) transcription task and three diarization tasks: “Who Spoke When”, “Speech Activity Detection”, and “Source Localization.” The latter two were first-time experimental proof-of-concept tasks and were treated as “dry runs”. For the STT task, the lowest word error rate for the multiple distant microphone condition was 30.0% which represented an impressive 33% relative reduction from the best result obtained in the last such evaluation – the Rich Transcription Spring 2004 Meeting Recognition Evaluation. For the diarization “Who Spoke When” task, the lowest diarization error rate was 18.56% which represented a 19% relative reduction from that of RT-04S.