A comparative study on translation units for bilingual lexicon extraction

  • Authors:
  • Kaoru Yamamoto;Yuji Matsumoto;Mihoko Kitamura

  • Affiliations:
  • Nara Institute of Science and Technology, Takayama, Ikoma, Nara, Japan;Nara Institute of Science and Technology, Takayama, Ikoma, Nara, Japan;Nara Institute of Science and Technology, Takayama, Ikoma, Nara, Japan

  • Venue:
  • DMMT '01 Proceedings of the workshop on Data-driven methods in machine translation - Volume 14
  • Year:
  • 2001

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper presents on-going research on automatic extraction of bilingual lexicon from English-Japanese parallel corpora. The main objective of this paper is to examine various N-gram models of generating translation units for bilingual lexicon extraction. Three N-gram models, a baseline model (Bound-length N-gram) and two new models (Chunk-bound N-gram and Dependency-linked N-gram) are compared. An experiment with 10000 English-Japanese parallel sentences shows that Chunk-bound N-gram produces the best result in terms of accuracy (83%) as well as coverage (60%) and it improves approximately by 13% in accuracy and by 5-9% in coverage from the previously proposed baseline model.