Evaluation methodology and metrics employed to assess the TRANSTAC two-way, speech-to-speech translation systems

Authors:
Gregory A. Sanders;Brian A. Weiss;Craig Schlenoff;Michelle P. Steves;Sherri Condon
Affiliations:
National Institute of Standards and Technology, 100 Bureau Drive, Gaithersburg, MD 20899, USA;National Institute of Standards and Technology, 100 Bureau Drive, Gaithersburg, MD 20899, USA;National Institute of Standards and Technology, 100 Bureau Drive, Gaithersburg, MD 20899, USA;National Institute of Standards and Technology, 100 Bureau Drive, Gaithersburg, MD 20899, USA;The MITRE Corporation, 7515 Colshire Drive, Mailstop H305, McLean, VA 22102, USA
Venue:
Computer Speech and Language
Year:
2013

Citing 17
Cited 1

Evaluating usability evaluation techniques

ACM Computing Surveys (CSUR) - Special issue: position statements on strategic directions in computing research
Semi-structured interviewing for user-centered design

interactions
A Practical Guide to Usability Testing

A Practical Guide to Usability Testing
BLEU: a method for automatic evaluation of machine translation

ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
Which comes first, usability or utility?

Proceedings of the 14th IEEE Visualization 2003 (VIS'03)
BLANC: learning evaluation metrics for MT

HLT '05 Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing
Metrics for evaluating human information interaction systems

Interacting with Computers
Some Improvements over the BLEU Metric for Measuring Translation Quality for Hindi

ICCTA '07 Proceedings of the International Conference on Computing: Theory and Applications
Automatic evaluation of machine translation quality using n-gram co-occurrence statistics

HLT '02 Proceedings of the second international conference on Human Language Technology Research
Applying SCORE to field-based performance evaluations of soldier worn sensor technologies: Field Reports

Journal of Field Robotics
Dependency-based automatic evaluation for machine translation

SSST '07 Proceedings of the NAACL-HLT 2007/AMTA Workshop on Syntax and Structure in Statistical Translation
Evolution of the SCORE framework to enhance field-based performance evaluations of emerging technologies

PerMIS '08 Proceedings of the 8th Workshop on Performance Metrics for Intelligent Systems
Evaluating speech translation systems: applying SCORE to TRANSTAC technologies

PerMIS '09 Proceedings of the 9th Workshop on Performance Metrics for Intelligent Systems
The impact of evaluation scenario development on the quantitative performance of speech translation systems prescribed by the SCORE framework

PerMIS '09 Proceedings of the 9th Workshop on Performance Metrics for Intelligent Systems
Probability of successful transfer of low-level concepts via machine translation: a meta-evaluation

PerMIS '09 Proceedings of the 9th Workshop on Performance Metrics for Intelligent Systems
Utility assessment in TRANSTAC: using a set of complementary methods

PerMIS '09 Proceedings of the 9th Workshop on Performance Metrics for Intelligent Systems
Evaluation of 2-way Iraqi Arabic---English speech translation systems using automated metrics

Machine Translation

Stereo hidden Markov modeling for noise robust speech recognition

Computer Speech and Language

Quantified Score

Hi-index	0.00

Visualization

Abstract

One of the most difficult challenges that military personnel face when operating in foreign countries is clear and successful communication with the local population. To address this issue, the Defense Advanced Research Projects Agency (DARPA) is funding academic institutions and industrial organizations through the Spoken Language Communication and Translation System for Tactical Use (TRANSTAC) program to develop practical machine translation systems. The goal of the TRANSTAC program is to demonstrate capabilities to rapidly develop and field free-form, two-way, speech-to-speech translation systems that enable speakers of different languages to communicate with one another in real-world tactical situations without an interpreter. Evaluations of these technologies are a significant part of the program and DARPA has asked the National Institute of Standards and Technology (NIST) to lead this effort. This article presents the experimental design of the TRANSTAC evaluations and the metrics, both quantitative and qualitative, that were used to comprehensively assess the systems' performance.