Splitting input sentence for machine translation using language model with sentence similarity

Authors:
Takao Doi;Eiichiro Sumita
Affiliations:
ATR Spoken Language, Translation Research Laboratories, Kyoto, Japan;ATR Spoken Language, Translation Research Laboratories, Kyoto, Japan
Venue:
COLING '04 Proceedings of the 20th international conference on Computational Linguistics
Year:
2004

Citing 10
Cited 0

A maximum entropy approach to natural language processing

Computational Linguistics
Input Segmentation of Spontaneous Speech in JANUS: A Speech-to-speech Translation System

ECAI '96 Workshop on Dialogue Processing in Spoken Language Systems
An Architecture for a Text Simplification System

LEC '02 Proceedings of the Language Engineering Conference (LEC'02)
Example retrieval from a translation memory

Natural Language Engineering
Splitting long or ill-formed input for robust spoken-language translation

COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 1
Experiments and prospects of Example-Based Machine Translation

ACL '91 Proceedings of the 29th annual meeting on Association for Computational Linguistics
Example-based machine translation using DP-matching between word sequences

DMMT '01 Proceedings of the workshop on Data-driven methods in machine translation - Volume 14
Generation of word graphs in statistical machine translation

EMNLP '02 Proceedings of the ACL-02 conference on Empirical methods in natural language processing - Volume 10
Input sentence splitting and translating

HLT-NAACL-PARALLEL '03 Proceedings of the HLT-NAACL 2003 Workshop on Building and using parallel texts: data driven machine translation and beyond - Volume 3
Automatic evaluation of machine translation quality using n-gram co-occurrence statistics

HLT '02 Proceedings of the second international conference on Human Language Technology Research

Quantified Score

Hi-index	0.00

Visualization

Abstract

In order to boost the translation quality of corpus-based MT systems for speech translation, the technique of splitting an input sentence appears promising. In previous research, many methods used N-gram clues to split sentences. In this paper, to supplement N-gram based splitting methods, we introduce another clue using sentence similarity based on edit-distance. In our splitting method, we generate candidates for sentence splitting based on N-grams, and select the best one by measuring sentence similarity. We conducted experiments using two EBMT systems, one of which uses a phrase and the other of which uses a sentence as a translation unit. The translation results on various conditions were evaluated by objective measures and a subjective measure. The experimental results show that the proposed method is valuable for both systems.