A comparative study on translation units for bilingual lexicon extraction

Authors:
Kaoru Yamamoto;Yuji Matsumoto;Mihoko Kitamura
Affiliations:
Nara Institute of Science and Technology, Takayama, Ikoma, Nara, Japan;Nara Institute of Science and Technology, Takayama, Ikoma, Nara, Japan;Nara Institute of Science and Technology, Takayama, Ikoma, Nara, Japan
Venue:
DMMT '01 Proceedings of the workshop on Data-driven methods in machine translation - Volume 14
Year:
2001

Citing 6
Cited 2

Translating collocations for bilingual lexicons: a statistical approach

Computational Linguistics
Models of translational equivalence among words

Computational Linguistics
An algorithm for finding noun phrase correspondences in bilingual corpora

ACL '93 Proceedings of the 31st annual meeting on Association for Computational Linguistics
Learning bilingual collocations by word-level sorting

COLING '96 Proceedings of the 16th conference on Computational linguistics - Volume 1
Finding structural correspondences from bilingual parsed corpus for corpus-based translation

COLING '00 Proceedings of the 18th conference on Computational linguistics - Volume 2
Acquisition of phrase-level bilingual correspondence using dependency structure

COLING '00 Proceedings of the 18th conference on Computational linguistics - Volume 2

Learning translations of named-entity phrases from parallel corpora

EACL '03 Proceedings of the tenth conference on European chapter of the Association for Computational Linguistics - Volume 1
Learning sequence-to-sequence correspondences from parallel corpora via sequential pattern mining

HLT-NAACL-PARALLEL '03 Proceedings of the HLT-NAACL 2003 Workshop on Building and using parallel texts: data driven machine translation and beyond - Volume 3

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper presents on-going research on automatic extraction of bilingual lexicon from English-Japanese parallel corpora. The main objective of this paper is to examine various N-gram models of generating translation units for bilingual lexicon extraction. Three N-gram models, a baseline model (Bound-length N-gram) and two new models (Chunk-bound N-gram and Dependency-linked N-gram) are compared. An experiment with 10000 English-Japanese parallel sentences shows that Chunk-bound N-gram produces the best result in terms of accuracy (83%) as well as coverage (60%) and it improves approximately by 13% in accuracy and by 5-9% in coverage from the previously proposed baseline model.