The rich transcription 2006 spring meeting recognition evaluation

  • Authors:
  • Jonathan G. Fiscus;Jerome Ajot;Martial Michel;John S. Garofolo

  • Affiliations:
  • National Institute of Standards and Technology, Gaithersburg, MD;National Institute of Standards and Technology, Gaithersburg, MD;National Institute of Standards and Technology, Gaithersburg, MD;National Institute of Standards and Technology, Gaithersburg, MD

  • Venue:
  • MLMI'06 Proceedings of the Third international conference on Machine Learning for Multimodal Interaction
  • Year:
  • 2006

Quantified Score

Hi-index 0.00

Visualization

Abstract

We present the design and results of the Spring 2006 (RT-06S) Rich Transcription Meeting Recognition Evaluation; the fourth in a series of community-wide evaluations of language technologies in the meeting domain. For 2006, we supported three evaluation tasks in two meeting sub-domains: the Speech-To-Text (STT) transcription task, and the “Who Spoke When” and “Speech Activity Detection” diarization tasks. The meetings were from the Conference Meeting, and Lecture Meeting sub-domains. The lowest STT word error rate, with up to four simultaneous speakers, in the multiple distant microphone condition was 46.3% for the conference sub-domain, and 53.4% for the lecture sub-domain. For the “Who Spoke When” task, the lowest diarization error rates for all speech were 35.8% and 24.0% for the conference and lecture sub-domains respectively. For the “Speech Activity Detection” task, the lowest diarization error rates were 4.3% and 8.0% for the conference and lecture sub-domains respectively.