Towards building a multilingual semantic network: identifying interlingual links in Wikipedia

  • Authors:
  • Bharath Dandala;Rada Mihalcea;Razvan Bunescu

  • Affiliations:
  • University of North Texas, Denton, TX;University of North Texas, Denton, TX;Ohio University, Athens, Ohio

  • Venue:
  • SemEval '12 Proceedings of the First Joint Conference on Lexical and Computational Semantics - Volume 1: Proceedings of the main conference and the shared task, and Volume 2: Proceedings of the Sixth International Workshop on Semantic Evaluation
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

Wikipedia is a Web based, freely available multilingual encyclopedia, constructed in a collaborative effort by thousands of contributors. Wikipedia articles on the same topic in different languages are connected via interlingual (or translational) links. These links serve as an excellent resource for obtaining lexical translations, or building multilingual dictionaries and semantic networks. As these links are manually built, many links are missing or simply wrong. This paper describes a supervised learning method for generating new links and detecting existing incorrect links. Since there is no dataset available to evaluate the resulting interlingual links, we create our own gold standard by sampling translational links from four language pairs using distance heuristics. We manually annotate the sampled translation links and used them to evaluate the output of our method for automatic link detection and correction.