Identifying synonymous expressions from a bilingual corpus for example-based machine translation

  • Authors:
  • Mitsuo Shimohata;Eiichiro Sumita

  • Affiliations:
  • ATR Spoken Language Translation Research Laboratories, Kyoto, Japan;ATR Spoken Language Translation Research Laboratories, Kyoto, Japan

  • Venue:
  • COLING-MTIA '02 Proceedings of the 2002 COLING workshop on Machine translation in Asia - Volume 16
  • Year:
  • 2002

Quantified Score

Hi-index 0.00

Visualization

Abstract

Example-based machine translation (EBMT) is based on a bilingual corpus. In EBMT, sentences similar to an input sentence are retrieved from a bilingual corpus and then output is generated from translations of similar sentences. Therefore, a similarity measure between the input sentence and each sentence in the bilingual corpus is important for EBMT. If some similar sentences are missed from retrieval, the quality of translations drops. In this paper, we describe a method to acquire synonymous expressions from a bilingual corpus and utilize them to expand retrieval of similar sentences. Synonymous expressions are acquired from differences in synonymous sentences. Synonymous sentences are clustered by the equivalence of translations. Our method has the advantage of not relying on rich linguistic knowledge, such as sentence structure and dictionaries. We demonstrate the effect on applying our method to a simple EBMT.