Panlingual lexical translation via probabilistic inference

Authors:
Mausam;Stephen Soderland;Oren Etzioni;Daniel S. Weld;Kobi Reiter;Michael Skinner;Marcus Sammer;Jeff Bilmes
Affiliations:
Department of Computer Science and Engineering, Box 352350, University of Washington, Seattle, WA 98195, United States;Department of Computer Science and Engineering, Box 352350, University of Washington, Seattle, WA 98195, United States;Department of Computer Science and Engineering, Box 352350, University of Washington, Seattle, WA 98195, United States;Department of Computer Science and Engineering, Box 352350, University of Washington, Seattle, WA 98195, United States;Department of Computer Science and Engineering, Box 352350, University of Washington, Seattle, WA 98195, United States;Department of Computer Science and Engineering, Box 352350, University of Washington, Seattle, WA 98195, United States;Department of Computer Science and Engineering, Box 352350, University of Washington, Seattle, WA 98195, United States;Department of Computer Science and Engineering, Box 352350, University of Washington, Seattle, WA 98195, United States
Venue:
Artificial Intelligence
Year:
2010

Citing 24
Cited 3

Machine translation: past, present, future

Machine translation: past, present, future
Building a large-scale knowledge base for machine translation

AAAI '94 Proceedings of the twelfth national conference on Artificial intelligence (vol. 1)
Querying across languages: a dictionary-based approach to multilingual information retrieval

SIGIR '96 Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval
Beyond the flow decomposition barrier

Journal of the ACM (JACM)
Measuring index quality using random walks on the Web

WWW '99 Proceedings of the eighth international conference on World Wide Web
EuroWordNet: a multilingual database with lexical semantic networks

EuroWordNet: a multilingual database with lexical semantic networks
A Randomized Fully Polynomial Time Approximation Scheme for the All-Terminal Network Reliability Problem

SIAM Journal on Computing
Improving cross language retrieval with triangulated translation

Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
The mathematics of statistical machine translation: parameter estimation

Computational Linguistics - Special issue on using large corpora: II
A word-to-word model of translational equivalence

ACL '98 Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics and Eighth Conference of the European Chapter of the Association for Computational Linguistics
A program for aligning sentences in bilingual corpora

ACL '91 Proceedings of the 29th annual meeting on Association for Computational Linguistics
A pattern matching method for finding noun and proper noun translations from noisy parallel corpora

ACL '95 Proceedings of the 33rd annual meeting on Association for Computational Linguistics
The BICORD system: combining lexical information from bilingual corpora and machine readable dictionaries

COLING '90 Proceedings of the 13th conference on Computational linguistics - Volume 3
Multipath translation lexicon induction via bridge languages

NAACL '01 Proceedings of the second meeting of the North American Chapter of the Association for Computational Linguistics on Language technologies
Statistical phrase-based translation

NAACL '03 Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology - Volume 1
Inducing translation lexicons via diverse similarity measures and bridge languages

COLING-02 proceedings of the 6th conference on Natural language learning - Volume 20
Exploiting aggregate properties of bilingual dictionaries for distinguishing senses of English words and inducing English sense clusters

ACLdemo '04 Proceedings of the ACL 2004 on Interactive poster and demonstration sessions
A hierarchical phrase-based model for statistical machine translation

ACL '05 Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics
A discriminative framework for bilingual word alignment

HLT '05 Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing
Information arbitrage across multi-lingual Wikipedia

Proceedings of the Second ACM International Conference on Web Search and Data Mining
Amplifying community content creation with mixed initiative information extraction

Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
Word alignment for languages with scarce resources

ParaText '05 Proceedings of the ACL Workshop on Building and Using Parallel Texts
A rose is a roos is a ruusu: querying translations for web image search

ACLShort '09 Proceedings of the ACL-IJCNLP 2009 Conference Short Papers
Compiling a massive, multilingual dictionary via probabilistic inference

ACL '09 Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 1 - Volume 1

PanLex and LEXTRACT: translating all words of all languages of the world

COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics: Demonstrations
The CQC algorithm: cycling in graphs to semantically enrich and enhance a bilingual dictionary

Journal of Artificial Intelligence Research
Collaboratively built semi-structured content and Artificial Intelligence: The story so far

Artificial Intelligence

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper introduces a novel approach to the task of lexical translation between languages for which no translation dictionaries are available. We build a massive translation graph, automatically constructed from over 630 machine-readable dictionaries and Wiktionaries. In this graph each node denotes a word in some language and each edge (v"i,v"j) denotes a word sense shared by v"i and v"j. Our current graph contains over 10,000,000 nodes and expresses more than 60,000,000 pairwise translations. The composition of multiple translation dictionaries leads to a transitive inference problem: if word A translates to word B which in turn translates to word C, what is the probability that C is a translation of A? The paper describes a series of probabilistic inference algorithms that solve this problem at varying precision and recall levels. All algorithms enable us to quantify our confidence in a translation derived from the graph, and thus trade precision for recall. We compile the results of our best inference algorithm to yield PanDictionary, a novel multilingual dictionary. PanDictionary contains more than four times as many translations as in the largest Wiktionary at precision 0.90 and over 200,000,000 pairwise translations in over 200,000 language pairs at precision 0.8.