Extraction of lexical translations from non-aligned corpora

Authors:
Kumiko Tanaka;Hideya Iwasaki
Affiliations:
The University of Tokyo, Tokyo, Japan;The University of Tokyo, Tokyo, Japan
Venue:
COLING '96 Proceedings of the 16th conference on Computational linguistics - Volume 2
Year:
1996

Citing 8
Cited 31

Word association norms, mutual information, and lexicography

Computational Linguistics
Word sense disambiguation using a second language monolingual corpus

Computational Linguistics
Accurate methods for the statistics of surprise and coincidence

Computational Linguistics - Special issue on using large corpora: I
The mathematics of statistical machine translation: parameter estimation

Computational Linguistics - Special issue on using large corpora: II
A pattern matching method for finding noun and proper noun translations from noisy parallel corpora

ACL '95 Proceedings of the 33rd annual meeting on Association for Computational Linguistics
Identifying word translations in non-parallel texts

ACL '95 Proceedings of the 33rd annual meeting on Association for Computational Linguistics
Bilingual text, matching using bilingual dictionary and statistics

COLING '94 Proceedings of the 15th conference on Computational linguistics - Volume 2
Construction of a bilingual dictionary intermediated by a third language

COLING '94 Proceedings of the 15th conference on Computational linguistics - Volume 1

Unit Completion for a Computer-aided Translation Typing System

Machine Translation
Term-list translation using mono-lingual word co-occurrence vectors

COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 1
Mixed language query disambiguation

ACL '99 Proceedings of the 37th annual meeting of the Association for Computational Linguistics on Computational Linguistics
Effect of cross-language IR in bilingual lexicon acquisition from comparable corpora

EACL '03 Proceedings of the tenth conference on European chapter of the Association for Computational Linguistics - Volume 1
Translation Disambiguation in Mixed Language Queries

Machine Translation
Base Noun Phrase translation using web data and the EM algorithm

COLING '02 Proceedings of the 19th international conference on Computational linguistics - Volume 1
Extracting word sequence correspondences with support vector machines

COLING '02 Proceedings of the 19th international conference on Computational linguistics - Volume 1
Unsupervised word sense disambiguation using bilingual comparable corpora

COLING '02 Proceedings of the 19th international conference on Computational linguistics - Volume 1
Measuring the similarity between compound nouns in different languages using non-parallel corpora

COLING '02 Proceedings of the 19th international conference on Computational linguistics - Volume 1
An approach based on multilingual thesauri and model combination for bilingual lexicon extraction

COLING '02 Proceedings of the 19th international conference on Computational linguistics - Volume 1
Mining comparable bilingual text corpora for cross-language information integration

Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining
Learning bilingual translations from comparable corpora to cross-language information retrieval: hybrid statistics-based and linguistics-based approach

AsianIR '03 Proceedings of the sixth international workshop on Information retrieval with Asian languages - Volume 11
Automatic extraction of bilingual word pairs using inductive chain learning in various languages

Information Processing and Management: an International Journal
A geometric view on bilingual lexicon extraction from comparable corpora

ACL '04 Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics
Named entity transliteration with comparable corpora

ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
Leveraging reusability: cost-effective lexical acquisition for large-scale ontology translation

ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
Bilingual-dictionary adaptation to domains

COLING '04 Proceedings of the 20th international conference on Computational Linguistics
Finding translations for low-frequency words in comparable corpora

Machine Translation
Methods for extracting and classifying pairs of cognates and false friends

Machine Translation
Unsupervised named entity transliteration using temporal and phonetic correlation

EMNLP '06 Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing
Automatic acquisition of bilingual rules for extraction of bilingual word pairs from parallel corpora

DeepLA '05 Proceedings of the ACL-SIGLEX Workshop on Deep Lexical Acquisition
Automatic processing of multilingual medical terminology: applications to thesaurus enrichment and cross-language information retrieval

Artificial Intelligence in Medicine
Multilingual topic models for unaligned text

UAI '09 Proceedings of the Twenty-Fifth Conference on Uncertainty in Artificial Intelligence
Bilingual lexicon generation using non-aligned signatures

ACL '10 Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics
Using comparable corpora to improve the effectiveness of cross-language information retrieval

IceTAL'10 Proceedings of the 7th international conference on Advances in natural language processing
Machine transliteration survey

ACM Computing Surveys (CSUR)
Cross lingual text classification by mining multilingual topics from wikipedia

Proceedings of the fourth ACM international conference on Web search and data mining
Acquiring bilingual named entity translations from content-aligned corpora

IJCNLP'04 Proceedings of the First international joint conference on Natural Language Processing
Automatic extraction of low frequency bilingual word pairs from parallel corpora with various languages

PAKDD'05 Proceedings of the 9th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining
Web-based terminology translation mining

IJCNLP'05 Proceedings of the Second international joint conference on Natural Language Processing
Mining a Persian-English comparable corpus for cross-language information retrieval

Information Processing and Management: an International Journal

Quantified Score

Hi-index	0.00

Visualization

Abstract

A method for extracting lexical translations from non-aligned corpora is proposed to cope with the unavailability of large aligned corpus. The assumption that "translations of two co-occurring words in a source language also co-occur in the target language" is adopted and represented in the stochastic matrix formulation. The translation matrix provides the co-occurring information translated from the source into the target. This translated co-occurring information should resemble that of the original in the target when the ambiguity of the translational relation is resolved. An algorithm to obtain the best translation matrix is introduced. Some experiments were performed to evaluate the effectiveness of the ambiguity resolution and the refinement of the dictionary.