Foundations of statistical natural language processing
Foundations of statistical natural language processing
Information Retrieval
A systematic comparison of various statistical alignment models
Computational Linguistics
Models of translational equivalence among words
Computational Linguistics
An IR approach for translating new words from nonparallel, comparable texts
COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 1
Identifying word translations in non-parallel texts
ACL '95 Proceedings of the 33rd annual meeting on Association for Computational Linguistics
Automatic identification of word translations from unrelated English and German corpora
ACL '99 Proceedings of the 37th annual meeting of the Association for Computational Linguistics on Computational Linguistics
Dictionary-based techniques for cross-language information retrieval
Information Processing and Management: an International Journal - Special issue: Cross-language information retrieval
Looking for candidate translational equivalents in specialized, comparable corpora
COLING '02 Proceedings of the 19th international conference on Computational linguistics - Volume 2
An approach based on multilingual thesauri and model combination for bilingual lexicon extraction
COLING '02 Proceedings of the 19th international conference on Computational linguistics - Volume 1
Reliable measures for aligning Japanese-English news articles and sentences
ACL '03 Proceedings of the 41st Annual Meeting on Association for Computational Linguistics - Volume 1
Improving Machine Translation Performance by Exploiting Non-Parallel Corpora
Computational Linguistics
Learning a translation lexicon from monolingual corpora
ULA '02 Proceedings of the ACL-02 workshop on Unsupervised lexical acquisition - Volume 9
A geometric view on bilingual lexicon extraction from comparable corpora
ACL '04 Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics
Mining multilingual topics from wikipedia
Proceedings of the 18th international conference on World wide web
Feature-based method for document alignment in comparable news corpora
EACL '09 Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics
Cross-language linking of news stories on the web using interlingual topic modelling
Proceedings of the 2nd ACM workshop on Social web search and mining
Explicit versus latent concept models for cross-language information retrieval
IJCAI'09 Proceedings of the 21st international jont conference on Artifical intelligence
EMNLP '09 Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 2 - Volume 2
Bilingual lexicon generation using non-aligned signatures
ACL '10 Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics
Revisiting context-based projection methods for term-translation spotting in comparable corpora
COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics
Clustering comparable corpora for bilingual lexicon extraction
HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies: short papers - Volume 2
Identifying word translations from comparable corpora using latent topic models
HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies: short papers - Volume 2
A language modeling approach for extracting translation knowledge from comparable corpora
ECIR'13 Proceedings of the 35th European conference on Advances in Information Retrieval
ECIR'13 Proceedings of the 35th European conference on Advances in Information Retrieval
Hi-index | 0.00 |
In this paper, we extend the work on using latent cross-language topic models for identifying word translations across comparable corpora. We present a novel precision-oriented algorithm that relies on per-topic word distributions obtained by the bilingual LDA (BiLDA) latent topic model. The algorithm aims at harvesting only the most probable word translations across languages in a greedy fashion, without any prior knowledge about the language pair, relying on a symmetrization process and the one-to-one constraint. We report our results for Italian-English and Dutch-English language pairs that outperform the current state-of-the-art results by a significant margin. In addition, we show how to use the algorithm for the construction of high-quality initial seed lexicons of translations.