The Rich Transcription 2007 Meeting Recognition Evaluation

Authors:
Jonathan G. Fiscus;Jerome Ajot;John S. Garofolo
Affiliations:
National Institute Of Standards and Technology, Gaithersburg, MD 20899;National Institute Of Standards and Technology, Gaithersburg, MD 20899 and Systems Plus, Inc., Rockville, MD 20850;National Institute Of Standards and Technology, Gaithersburg, MD 20899
Venue:
Multimodal Technologies for Perception of Humans
Year:
2008

Citing 3
Cited 3

The CLEAR 2006 evaluation

CLEAR'06 Proceedings of the 1st international evaluation conference on Classification of events, activities and relationships
The rich transcription 2005 spring meeting recognition evaluation

MLMI'05 Proceedings of the Second international conference on Machine Learning for Multimodal Interaction
The rich transcription 2006 spring meeting recognition evaluation

MLMI'06 Proceedings of the Third international conference on Machine Learning for Multimodal Interaction

Towards automatic speaker retrieval for large multimedia archives

Proceedings of the 3rd international workshop on Automated information extraction in media production
Spoken Content Retrieval: A Survey of Techniques and Technologies

Foundations and Trends in Information Retrieval
Pattern discovery in data streams under the time warping distance

The VLDB Journal — The International Journal on Very Large Data Bases

Quantified Score

Hi-index	0.00

Visualization

Abstract

We present the design and results of the Spring 2007 (RT-07) Rich Transcription Meeting Recognition Evaluation; the fifth in a series of community-wide evaluations of language technologies in the meeting domain. For 2007, we supported three evaluation tasks: Speech-To-Text (STT) transcription, "Who Spoke When" Diarization (SPKR), and Speaker Attributed Speech-To-Text (SASTT). The SASTT task, which combines STT and SPKR tasks, was a new evaluation task. The test data consisted of three test sets: Conference Meetings, Lecture Meetings, and Coffee Breaks from lecture meetings. The Coffee Break data was included as a new test set this year. Twenty-one research sites materially contributed to the evaluation by providing data or building systems. The lowest STT word error rates with up to four simultaneous speakers in the multiple distant microphone condition were 40.6 %, 49.8 %, and 48.4 % for the conference, lecture, and coffee break test sets respectively. For the SPKR task, the lowest diarization error rates for all speech in the multiple distant microphone condition were 8.5 %, 25.8 %, and 25.5 % for the conference, lecture, and coffee break test sets respectively. For the SASTT task, the lowest speaker attributed word error rates for segments with up to three simultaneous speakers in the multiple distant microphone condition were 40.3 %, 59.3 %, and 68.4 % for the conference, lecture, and coffee break test sets respectively.