Splitting long or ill-formed input for robust spoken-language translation

Authors:
Osamu Furuse;Setsuo Yamada;Kazuhide Yamamoto
Affiliations:
ATR Interpreting Telecommunications Research Laboratories, Kyoto, Japan;ATR Interpreting Telecommunications Research Laboratories, Kyoto, Japan;ATR Interpreting Telecommunications Research Laboratories, Kyoto, Japan
Venue:
COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 1
Year:
1998

Citing 3
Cited 7

Constituent boundary parsing for example-based machine translation

COLING '94 Proceedings of the 15th conference on Computational linguistics - Volume 1
Incremental translation utilizing constituent boundary patterns

COLING '96 Proceedings of the 16th conference on Computational linguistics - Volume 1
Spontaneous dialogue speech recognition using cross-word context constrained word graphs

ICASSP '96 Proceedings of the Acoustics, Speech, and Signal Processing, 1996. on Conference Proceedings., 1996 IEEE International Conference - Volume 01

Interactive Chinese-to-English speech translation based on dialogue management

S2S '02 Proceedings of the ACL-02 workshop on Speech-to-speech translation: algorithms and systems - Volume 7
Automatic interpretation system integrating free-style sentence translation and parallel text based translation

S2S '02 Proceedings of the ACL-02 workshop on Speech-to-speech translation: algorithms and systems - Volume 7
Input sentence splitting and translating

HLT-NAACL-PARALLEL '03 Proceedings of the HLT-NAACL 2003 Workshop on Building and using parallel texts: data driven machine translation and beyond - Volume 3
Utterance segmentation using combined approach based on Bi-directional N-gram and maximum entropy

SIGHAN '03 Proceedings of the second SIGHAN workshop on Chinese language processing - Volume 17
Splitting input sentence for machine translation using language model with sentence similarity

COLING '04 Proceedings of the 20th international conference on Computational Linguistics
Chinese utterance segmentation in spoken language translation

CICLing'03 Proceedings of the 4th international conference on Computational linguistics and intelligent text processing
Divide and translate: improving long distance reordering in statistical machine translation

WMT '10 Proceedings of the Joint Fifth Workshop on Statistical Machine Translation and MetricsMATR

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper proposes an input-splitting method for translating spoken-language which includes many long or ill-formed expressions. The proposed method splits input into well-balanced translation units based on a semantic distance calculation. The splitting is performed during left-to-right parsing, and does not degrade translation efficiency. The complete translation result is formed by concatenating the partial translation results of each split unit. The proposed method can be incorporated into frameworks like TDMT, which utilize left-to-right parsing and a score for a substructure. Experimental results show that the proposed method gives TDMT the following advantages: (1) elimination of null outputs, (2) splitting of utterances into sentences, and (3) robust translation of erroneous speech recognition results.