The impact of evaluation scenario development on the quantitative performance of speech translation systems prescribed by the SCORE framework

Authors:
Brian A. Weiss;Craig Schlenoff
Affiliations:
National Institute of Standards and Technology, MS, Gaithersburg, Maryland;National Institute of Standards and Technology, MS, Gaithersburg, Maryland
Venue:
PerMIS '09 Proceedings of the 9th Workshop on Performance Metrics for Intelligent Systems
Year:
2009

Citing 3
Cited 2

Applying SCORE to field-based performance evaluations of soldier worn sensor technologies: Field Reports

Journal of Field Robotics
Evolution of the SCORE framework to enhance field-based performance evaluations of emerging technologies

PerMIS '08 Proceedings of the 8th Workshop on Performance Metrics for Intelligent Systems
Evaluating speech translation systems: applying SCORE to TRANSTAC technologies

PerMIS '09 Proceedings of the 9th Workshop on Performance Metrics for Intelligent Systems

The multi-relationship evaluation design framework: creating evaluation blueprints to assess advanced and intelligent technologies

Proceedings of the 10th Performance Metrics for Intelligent Systems Workshop
Evaluation methodology and metrics employed to assess the TRANSTAC two-way, speech-to-speech translation systems

Computer Speech and Language

Quantified Score

Hi-index	0.00

Visualization

Abstract

The Defense Advanced Research Projects Agency's (DARPA) Spoken Language Communication and Translation for Tactical Use (TRANSTAC) program is a focused advanced technology research and development program. The intent of this program is to demonstrate capabilities to quickly develop and implement free-form, two-way, speech-to-speech spoken language translation systems allowing speakers of different languages to communicate with each other in real-world tactical situations without the need for an interpreter. The National Institute of Standards and Technology (NIST), with support from the Mitre Corporation and Appen Pty Limited, has been funded by DARPA to evaluate the TRANSTAC technologies since 2006. The NIST-led Independent Evaluation Team (IET) has numerous responsibilities in this ongoing effort including collecting and processing training data, designing and implementing performance evaluations, and analyzing the test data. In order to design and execute fair and relevant evaluations, the NIST IET has employed the System, Component and Operationally-Relevant Evaluation (SCORE) framework. The SCORE framework is a unified set of criteria and tools built around the premise that, in order to gain an understanding of how a technology would perform in its intended environment, it must be evaluated at both the component and system levels and further tested in operationally-relevant environments while capturing both quantitative and qualitative performance data. Since an evaluation goal of the TRANSTAC program is to capture quantitative performance data of the translation technologies, the IET developed and implemented SCORE-inspired live evaluation scenarios. The two developed forms of live evaluation scenarios have unique impacts on the quantitative performance data. This paper presents the TRANSTAC program and SCORE methodology, as well as the evaluation scenarios and their influence on system performance.