Compiling a massive, multilingual dictionary via probabilistic inference

Authors:
Mausam;Stephen Soderland;Oren Etzioni;Daniel S. Weld;Michael Skinner;Jeff Bilmes
Affiliations:
University of Washington, Seattle;University of Washington, Seattle;University of Washington, Seattle;University of Washington, Seattle;Google, Seattle;University of Washington, Seattle
Venue:
ACL '09 Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 1 - Volume 1
Year:
2009

Citing 10
Cited 13

Building a large-scale knowledge base for machine translation

AAAI '94 Proceedings of the twelfth national conference on Artificial intelligence (vol. 1)
Measuring index quality using random walks on the Web

WWW '99 Proceedings of the eighth international conference on World Wide Web
EuroWordNet: a multilingual database with lexical semantic networks

EuroWordNet: a multilingual database with lexical semantic networks
A Randomized Fully Polynomial Time Approximation Scheme for the All-Terminal Network Reliability Problem

SIAM Journal on Computing
Improving cross language retrieval with triangulated translation

Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
A word-to-word model of translational equivalence

ACL '98 Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics and Eighth Conference of the European Chapter of the Association for Computational Linguistics
A program for aligning sentences in bilingual corpora

ACL '91 Proceedings of the 29th annual meeting on Association for Computational Linguistics
A pattern matching method for finding noun and proper noun translations from noisy parallel corpora

ACL '95 Proceedings of the 33rd annual meeting on Association for Computational Linguistics
Information arbitrage across multi-lingual Wikipedia

Proceedings of the Second ACM International Conference on Web Search and Data Mining
Amplifying community content creation with mixed initiative information extraction

Proceedings of the SIGCHI Conference on Human Factors in Computing Systems

A rose is a roos is a ruusu: querying translations for web image search

ACLShort '09 Proceedings of the ACL-IJCNLP 2009 Conference Short Papers
Panlingual lexical translation via probabilistic inference

Artificial Intelligence
Bilingual lexicon generation using non-aligned signatures

ACL '10 Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics
BabelNet: building a very large multilingual semantic network

ACL '10 Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics
UHD: Cross-lingual word sense disambiguation using multilingual co-occurrence graphs

SemEval '10 Proceedings of the 5th International Workshop on Semantic Evaluation
MENTA: inducing multilingual taxonomies from wikipedia

CIKM '10 Proceedings of the 19th ACM international conference on Information and knowledge management
PanLex and LEXTRACT: translating all words of all languages of the world

COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics: Demonstrations
Semantic relations in bilingual lexicons

ACM Transactions on Speech and Language Processing (TSLP)
Parallel corpora and WordSpace models: using a third language as an interlingua to enrich multilingual resources

International Journal of Information and Communication Technology
Analyzing methods for improving precision of pivot based bilingual dictionaries

EMNLP '11 Proceedings of the Conference on Empirical Methods in Natural Language Processing
The CQC algorithm: cycling in graphs to semantically enrich and enhance a bilingual dictionary

Journal of Artificial Intelligence Research
BabelNet: The automatic construction, evaluation and application of a wide-coverage multilingual semantic network

Artificial Intelligence
The CQC algorithm: cycling in graphs to semantically enrich and enhance a bilingual dictionary (extended abstract)

IJCAI'13 Proceedings of the Twenty-Third international joint conference on Artificial Intelligence

Quantified Score

Hi-index	0.00

Visualization

Abstract

Can we automatically compose a large set of Wiktionaries and translation dictionaries to yield a massive, multilingual dictionary whose coverage is substantially greater than that of any of its constituent dictionaries? The composition of multiple translation dictionaries leads to a transitive inference problem: if word A translates to word B which in turn translates to word C, what is the probability that C is a translation of A? The paper introduces a novel algorithm that solves this problem for 10,000,000 words in more than 1,000 languages. The algorithm yields PanDictionary, a novel multilingual dictionary. PanDictionary contains more than four times as many translations than in the largest Wiktionary at precision 0.90 and over 200,000,000 pairwise translations in over 200,000 language pairs at precision 0.8.