Selection and information: a class-based approach to lexical relationships
Selection and information: a class-based approach to lexical relationships
Similarity-Based Models of Word Cooccurrence Probabilities
Machine Learning - Special issue on natural language learning
Termight: Coordinating Humans and Machines in Bilingual Terminology Acquisition
Machine Translation
Estimating Word Translation Probabilities from Unrelated Monolingual Corpora Using the EM Algorithm
Proceedings of the Seventeenth National Conference on Artificial Intelligence and Twelfth Conference on Innovative Applications of Artificial Intelligence
Using the web to obtain frequencies for unseen bigrams
Computational Linguistics - Special issue on web as corpus
Models of translational equivalence among words
Computational Linguistics
A non-projective dependency parser
ANLC '97 Proceedings of the fifth conference on Applied natural language processing
An IR approach for translating new words from nonparallel, comparable texts
COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 1
Distributional clustering of English words
ACL '93 Proceedings of the 31st annual meeting on Association for Computational Linguistics
Identifying word translations in non-parallel texts
ACL '95 Proceedings of the 33rd annual meeting on Association for Computational Linguistics
Extraction of lexical translations from non-aligned corpora
COLING '96 Proceedings of the 16th conference on Computational linguistics - Volume 2
Measures of distributional similarity
ACL '99 Proceedings of the 37th annual meeting of the Association for Computational Linguistics on Computational Linguistics
Distributional similarity models: clustering vs. nearest neighbors
ACL '99 Proceedings of the 37th annual meeting of the Association for Computational Linguistics on Computational Linguistics
Automatic identification of word translations from unrelated English and German corpora
ACL '99 Proceedings of the 37th annual meeting of the Association for Computational Linguistics on Computational Linguistics
Evaluating and combining approaches to selectional preference acquisition
EACL '03 Proceedings of the tenth conference on European chapter of the Association for Computational Linguistics - Volume 1
Effect of cross-language IR in bilingual lexicon acquisition from comparable corpora
EACL '03 Proceedings of the tenth conference on European chapter of the Association for Computational Linguistics - Volume 1
Looking for candidate translational equivalents in specialized, comparable corpora
COLING '02 Proceedings of the 19th international conference on Computational linguistics - Volume 2
An approach based on multilingual thesauri and model combination for bilingual lexicon extraction
COLING '02 Proceedings of the 19th international conference on Computational linguistics - Volume 1
A geometric view on bilingual lexicon extraction from comparable corpora
ACL '04 Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics
Extracting parallel sub-sentential fragments from non-parallel corpora
ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
Exploiting comparable corpora and bilingual dictionaries for cross-language text categorization
ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
French-english terminology extraction from comparable corpora
IJCNLP'05 Proceedings of the Second international joint conference on Natural Language Processing
Bilingual lexicon generation using non-aligned signatures
ACL '10 Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics
Improving corpus comparability for bilingual lexicon extraction from comparable corpora
COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics
Effective use of dependency structure for bilingual lexicon creation
CICLing'11 Proceedings of the 12th international conference on Computational linguistics and intelligent text processing - Volume Part II
Rare word translation extraction from aligned comparable documents
HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1
Parallel sentence generation from comparable corpora for improved SMT
Machine Translation
Statistical Extraction and Comparison of Pivot Words for Bilingual Lexicon Extension
ACM Transactions on Asian Language Information Processing (TALIP)
Bilingual lexicon extraction from comparable corpora using label propagation
EMNLP-CoNLL '12 Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning
Hi-index | 0.00 |
Statistical methods to extract translational equivalents from non-parallel corpora hold the promise of ensuring the required coverage and domain customisation of lexicons as well as accelerating their compilation and maintenance. A challenge for these methods are rare, less common words and expressions, which often have low corpus frequencies. However, it is rare words such as newly introduced terminology and named entities that present the main interest for practical lexical acquisition. In this article, we study possibilities of improving the extraction of low-frequency equivalents from bilingual comparable corpora. Our work is carried out in the general framework which discovers equivalences between words of different languages using similarities between their occurrence patterns found in respective monolingual corpora. We develop a method that aims to compensate for insufficient amounts of corpus evidence on rare words: prior to measuring cross-language similarities, the method uses same-language corpus data to model co-occurrence vectors of rare words by predicting their unseen co-occurrences and smoothing rare, unreliable ones. Our experimental evaluation demonstrates that the proposed method delivers a consistent and significant improvement on the conventional approach to this task.