Resolving surface forms to Wikipedia topics

  • Authors:
  • Yiping Zhou;Lan Nie;Omid Rouhani-Kalleh;Flavian Vasile;Scott Gaffney

  • Affiliations:
  • Yahoo! Labs at Sunnyvale;Yahoo! Labs at Sunnyvale;Yahoo! Labs at Sunnyvale;Yahoo! Labs at Sunnyvale;Yahoo! Labs at Sunnyvale

  • Venue:
  • COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

Ambiguity of entity mentions and concept references is a challenge to mining text beyond surface-level keywords. We describe an effective method of disambiguating surface forms and resolving them to Wikipedia entities and concepts. Our method employs an extensive set of features mined from Wikipedia and other large data sources, and combines the features using a machine learning approach with automatically generated training data. Based on a manually labeled evaluation set containing over 1000 news articles, our resolution model has 85% precision and 87.8% recall. The performance is significantly better than three baselines based on traditional context similarities or sense commonness measurements. Our method can be applied to other languages and scales well to new entities and concepts.