The impact of evaluation scenario development on the quantitative performance of speech translation systems prescribed by the SCORE framework

  • Authors:
  • Brian A. Weiss;Craig Schlenoff

  • Affiliations:
  • National Institute of Standards and Technology, MS, Gaithersburg, Maryland;National Institute of Standards and Technology, MS, Gaithersburg, Maryland

  • Venue:
  • PerMIS '09 Proceedings of the 9th Workshop on Performance Metrics for Intelligent Systems
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

The Defense Advanced Research Projects Agency's (DARPA) Spoken Language Communication and Translation for Tactical Use (TRANSTAC) program is a focused advanced technology research and development program. The intent of this program is to demonstrate capabilities to quickly develop and implement free-form, two-way, speech-to-speech spoken language translation systems allowing speakers of different languages to communicate with each other in real-world tactical situations without the need for an interpreter. The National Institute of Standards and Technology (NIST), with support from the Mitre Corporation and Appen Pty Limited, has been funded by DARPA to evaluate the TRANSTAC technologies since 2006. The NIST-led Independent Evaluation Team (IET) has numerous responsibilities in this ongoing effort including collecting and processing training data, designing and implementing performance evaluations, and analyzing the test data. In order to design and execute fair and relevant evaluations, the NIST IET has employed the System, Component and Operationally-Relevant Evaluation (SCORE) framework. The SCORE framework is a unified set of criteria and tools built around the premise that, in order to gain an understanding of how a technology would perform in its intended environment, it must be evaluated at both the component and system levels and further tested in operationally-relevant environments while capturing both quantitative and qualitative performance data. Since an evaluation goal of the TRANSTAC program is to capture quantitative performance data of the translation technologies, the IET developed and implemented SCORE-inspired live evaluation scenarios. The two developed forms of live evaluation scenarios have unique impacts on the quantitative performance data. This paper presents the TRANSTAC program and SCORE methodology, as well as the evaluation scenarios and their influence on system performance.