Panlingual lexical translation via probabilistic inference

  • Authors:
  • Mausam;Stephen Soderland;Oren Etzioni;Daniel S. Weld;Kobi Reiter;Michael Skinner;Marcus Sammer;Jeff Bilmes

  • Affiliations:
  • Department of Computer Science and Engineering, Box 352350, University of Washington, Seattle, WA 98195, United States;Department of Computer Science and Engineering, Box 352350, University of Washington, Seattle, WA 98195, United States;Department of Computer Science and Engineering, Box 352350, University of Washington, Seattle, WA 98195, United States;Department of Computer Science and Engineering, Box 352350, University of Washington, Seattle, WA 98195, United States;Department of Computer Science and Engineering, Box 352350, University of Washington, Seattle, WA 98195, United States;Department of Computer Science and Engineering, Box 352350, University of Washington, Seattle, WA 98195, United States;Department of Computer Science and Engineering, Box 352350, University of Washington, Seattle, WA 98195, United States;Department of Computer Science and Engineering, Box 352350, University of Washington, Seattle, WA 98195, United States

  • Venue:
  • Artificial Intelligence
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper introduces a novel approach to the task of lexical translation between languages for which no translation dictionaries are available. We build a massive translation graph, automatically constructed from over 630 machine-readable dictionaries and Wiktionaries. In this graph each node denotes a word in some language and each edge (v"i,v"j) denotes a word sense shared by v"i and v"j. Our current graph contains over 10,000,000 nodes and expresses more than 60,000,000 pairwise translations. The composition of multiple translation dictionaries leads to a transitive inference problem: if word A translates to word B which in turn translates to word C, what is the probability that C is a translation of A? The paper describes a series of probabilistic inference algorithms that solve this problem at varying precision and recall levels. All algorithms enable us to quantify our confidence in a translation derived from the graph, and thus trade precision for recall. We compile the results of our best inference algorithm to yield PanDictionary, a novel multilingual dictionary. PanDictionary contains more than four times as many translations as in the largest Wiktionary at precision 0.90 and over 200,000,000 pairwise translations in over 200,000 language pairs at precision 0.8.