Splitting long or ill-formed input for robust spoken-language translation

  • Authors:
  • Osamu Furuse;Setsuo Yamada;Kazuhide Yamamoto

  • Affiliations:
  • ATR Interpreting Telecommunications Research Laboratories, Kyoto, Japan;ATR Interpreting Telecommunications Research Laboratories, Kyoto, Japan;ATR Interpreting Telecommunications Research Laboratories, Kyoto, Japan

  • Venue:
  • COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 1
  • Year:
  • 1998

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper proposes an input-splitting method for translating spoken-language which includes many long or ill-formed expressions. The proposed method splits input into well-balanced translation units based on a semantic distance calculation. The splitting is performed during left-to-right parsing, and does not degrade translation efficiency. The complete translation result is formed by concatenating the partial translation results of each split unit. The proposed method can be incorporated into frameworks like TDMT, which utilize left-to-right parsing and a score for a substructure. Experimental results show that the proposed method gives TDMT the following advantages: (1) elimination of null outputs, (2) splitting of utterances into sentences, and (3) robust translation of erroneous speech recognition results.