KORE: keyphrase overlap relatedness for entity disambiguation

  • Authors:
  • Johannes Hoffart;Stephan Seufert;Dat Ba Nguyen;Martin Theobald;Gerhard Weikum

  • Affiliations:
  • Max Planck Institute for Informatics, Saarbrücken, Germany;Max Planck Institute for Informatics, Saarbrücken, Germany;Max Planck Institute for Informatics, Saarbrücken, Germany;Max Planck Institute for Informatics, Saarbrücken, Germany;Max Planck Institute for Informatics, Saarbrücken, Germany

  • Venue:
  • Proceedings of the 21st ACM international conference on Information and knowledge management
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

Measuring the semantic relatedness between two entities is the basis for numerous tasks in IR, NLP, and Web-based knowledge extraction. This paper focuses on disambiguating names in a Web or text document by jointly mapping all names onto semantically related entities registered in a knowledge base. To this end, we have developed a novel notion of semantic relatedness between two entities represented as sets of weighted (multi-word) keyphrases, with consideration of partially overlapping phrases. This measure improves the quality of prior link-based models, and also eliminates the need for (usually Wikipedia-centric) explicit interlinkage between entities. Thus, our method is more versatile and can cope with long-tail and newly emerging entities that have few or no links associated with them. For efficiency, we have developed approximation techniques based on min-hash sketches and locality-sensitive hashing. Our experiments on semantic relatedness and on named entity disambiguation demonstrate the superiority of our method compared to state-of-the-art baselines.