Extracting bilingual dictionary from comparable corpora with dependency heterogeneity

Authors:
Kun Yu;Junichi Tsujii
Affiliations:
The University of Tokyo, Bunkyo-ku, Tokyo, Japan;The University of Tokyo, Bunkyo-ku, Tokyo, Japan
Venue:
NAACL-Short '09 Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, Companion Volume: Short Papers
Year:
2009

Citing 4
Cited 5

Cross-Language Information Retrieval

Cross-Language Information Retrieval
A systematic comparison of various statistical alignment models

Computational Linguistics
A hybrid approach to word segmentation and POS tagging

ACL '07 Proceedings of the 45th Annual Meeting of the ACL on Interactive Poster and Demonstration Sessions
Developing a robust part-of-speech tagger for biomedical text

PCI'05 Proceedings of the 10th Panhellenic conference on Advances in Informatics

Improving corpus comparability for bilingual lexicon extraction from comparable corpora

COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics
Clustering comparable corpora for bilingual lexicon extraction

HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies: short papers - Volume 2
Extraction of bilingual cognates from wikipedia

PROPOR'12 Proceedings of the 10th international conference on Computational Processing of the Portuguese Language
Measuring comparability of documents in non-parallel corpora for efficient extraction of (semi-)parallel translation equivalents

EACL 2012 Proceedings of the Joint Workshop on Exploiting Synergies between Information Retrieval and Machine Translation (ESIRMT) and Hybrid Approaches to Machine Translation (HyTra)
Mining a Persian-English comparable corpus for cross-language information retrieval

Information Processing and Management: an International Journal

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper proposes an approach for bilingual dictionary extraction from comparable corpora. The proposed approach is based on the observation that a word and its translation share similar dependency relations. Experimental results using 250 randomly selected translation pairs prove that the proposed approach significantly outperforms the traditional context-based approach that uses bag-of-words around translation candidates.