Wikification via link co-occurrence

  • Authors:
  • Zhiyuan Cai;Kaiqi Zhao;Kenny Q. Zhu;Haixun Wang

  • Affiliations:
  • Shanghai Jiao Tong University, Shanghai, China;Shanghai Jiao Tong University, Shanghai, China;Shanghai Jiao Tong University, Shanghai, China;Google Research, Mountainview, CA, USA

  • Venue:
  • Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
  • Year:
  • 2013

Quantified Score

Hi-index 0.00

Visualization

Abstract

Wikification, which stands for the process of linking terms in a plain text document to Wikipedia articles which represent the correct meanings of the terms, can be thought of as a generalized Word Sense Disambiguation problem. It disambiguates multi-word expressions (MWEs) in addition to single words. Existing Wikification techniques either models the context of a given term as well as the Wikipedia article as bags of words, or compute global constraints among Wikipedia concepts by the link graph or link distributions. The first method doesn't achieve good results because the MWEs can have very different meanings than its constituent words which themselves are ambiguous. The second method doesn't produce high accuracy because the link structure or link distribution is often biased or incomplete by themselves due to the fact that Wikipedia pages are often sparsely linked. In this paper, we present a simple but powerful framework of sense disambiguation using co-occurrences of Wikipedia links in the Wikipedia corpus. We propose an iterative method to enrich the sparsely-linked articles by adding more links and then use the resulting link co-occurrence matrix to disambiguate an input document by a sliding window algorithm. Our prototype system achieves 89.97% precision and 76.43% recall on average for three benchmark data and compares favorably against four state-of-the-art wikification techniques.