A characterization of the problem of new, out-of-vocabulary words in continuous-speech recognition and understanding
A systematic comparison of various statistical alignment models
Computational Linguistics
Manual and automatic evaluation of machine translation between European languages
StatMT '06 Proceedings of the Workshop on Statistical Machine Translation
Hi-index | 0.00 |
Abstract: Globalization as well as international crises and disasters spur the need for cross-lingual verbal communication for myriad languages. This is reflected in ongoing intense research activity in the field of speech translation. However, the development of deployable speech translation systems still happens only for a handful of languages. Prohibitively high costs attached to the acquisition of sufficient amounts of suitable speech translation training data are one of the main reasons for this situation. A new language pair or domain is typically only considered for speech translation development after a major need for cross-lingual verbal communication just arose-justifying the high development costs. In such situations, communication has to rely on the help of interpreters, while massive data collections for system development are conducted in parallel. We propose an alternative to this time-consuming and costly parallel effort. By training speech translation directly on audio recordings of interpreter-mediated communication, we omit most of the manual transcription effort and all of the manual translation effort that characterizes traditional speech translation development.