Example-based machine translation using efficient sentence retrieval based on edit-distance

  • Authors:
  • Takao Doi;Hirofumi Yamamoto;Eiichiro Sumita

  • Affiliations:
  • ATR Spoken Language Communication Research Laboratories, Kyoto-fu, Japan;ATR Spoken Language Communication Research Laboratories, Kyoto-fu, Japan;ATR Spoken Language Communication Research Laboratories, Kyoto-fu, Japan

  • Venue:
  • ACM Transactions on Asian Language Information Processing (TALIP)
  • Year:
  • 2005

Quantified Score

Hi-index 0.00

Visualization

Abstract

An Example-Based Machine Translation (EBMT) system, whose translation example unit is a sentence, can produce an accurate and natural translation if translation examples similar enough to an input sentence are retrieved. Such a system, however, suffers from the problem of narrow coverage. To reduce the problem, a large-scale parallel corpus is required and, therefore, an efficient method is needed to retrieve translation examples from a large-scale corpus. The authors propose an efficient retrieval method for a sentence-wise EBMT using edit-distance. The proposed retrieval method efficiently retrieves the most similar sentences using the measure of edit-distance without omissions. The proposed method employs search-space division, word graphs, and an A* search algorithm. The performance of the EBMT was evaluated through Japanese-to-English translation experiments using a bilingual corpus comprising hundreds of thousands of sentences from a travel conversation domain. The EBMT system achieved a high-quality translation ability by using a large corpus and also achieved efficient processing by using the proposed retrieval method.