Evaluating speech translation systems: applying SCORE to TRANSTAC technologies

Authors:
Craig Schlenoff;Greg Sanders;Brian Weiss;Fred Proctor;Michelle Potts Steves;Ann Virts
Affiliations:
National Institute of Standards and Technology, Gaithersburg, MD;National Institute of Standards and Technology, Gaithersburg, MD;National Institute of Standards and Technology, Gaithersburg, MD;National Institute of Standards and Technology, Gaithersburg, MD;National Institute of Standards and Technology, Gaithersburg, MD;National Institute of Standards and Technology, Gaithersburg, MD
Venue:
PerMIS '09 Proceedings of the 9th Workshop on Performance Metrics for Intelligent Systems
Year:
2009

Citing 4
Cited 5

BLEU: a method for automatic evaluation of machine translation

ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
Applying SCORE to field-based performance evaluations of soldier worn sensor technologies: Field Reports

Journal of Field Robotics
The evolution of performance metrics in the RoboCup Rescue Virtual Robot Competition

PerMIS '07 Proceedings of the 2007 Workshop on Performance Metrics for Intelligent Systems
Evolution of the SCORE framework to enhance field-based performance evaluations of emerging technologies

PerMIS '08 Proceedings of the 8th Workshop on Performance Metrics for Intelligent Systems

The impact of evaluation scenario development on the quantitative performance of speech translation systems prescribed by the SCORE framework

PerMIS '09 Proceedings of the 9th Workshop on Performance Metrics for Intelligent Systems
The multi-relationship evaluation design framework: creating evaluation blueprints to assess advanced and intelligent technologies

Proceedings of the 10th Performance Metrics for Intelligent Systems Workshop
Lessons learned in evaluating DARPA advanced military technologies

Proceedings of the 10th Performance Metrics for Intelligent Systems Workshop
The IBM speech-to-speech translation system for smartphone: Improvements for resource-constrained tasks

Computer Speech and Language
Evaluation methodology and metrics employed to assess the TRANSTAC two-way, speech-to-speech translation systems

Computer Speech and Language

Quantified Score

Hi-index	0.00

Visualization

Abstract

The Spoken Language Communication and Translation System for Tactical Use (TRANSTAC) program is a Defense Advanced Research Projects Agency (DARPA) advanced technology research and development program. The goal of the TRANSTAC program is to demonstrate capabilities to rapidly develop and field free-form, two-way translation systems that enable speakers of different languages to communicate with one another in realworld tactical situations without an interpreter. The National Institute of Standards and Technology (NIST), along with support from MITRE and Appen Pty Ltd., have been funded to serve as the Independent Evaluation Team (IET) for the TRANSTAC Program. The IET is responsible for analyzing the performance of the TRANSTAC systems by designing and executing multiple TRANSTAC evaluations and analyzing the results of the evaluation. To accomplish this, NIST has applied the SCORE (System, Component, and Operationally Relevant Evaluations) Framework. SCORE is a unified set of criteria and software tools for defining a performance evaluation approach for complex intelligent systems. It provides a comprehensive evaluation blueprint that assesses the technical performance of a system and its components through isolating variables as well as capturing end-user utility of the system in realistic use-case environments. This document describes the TRANSTAC program and explains how the SCORE framework was applied to assess the technical and utility performance of the TRANSTAC systems.