Entity resolution using search engine results

  • Authors:
  • Madian Khabsa;Pucktada Treeratpituk;C. Lee Giles

  • Affiliations:
  • The Pennsylvania State University, University Park, PA, USA;The Pennsylvania State University, University Park, PA, USA;The Pennsylvania State University, University Park, PA, USA

  • Venue:
  • Proceedings of the 21st ACM international conference on Information and knowledge management
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

Given a set of automatically extracted entities E of size n, we would like to cluster all the various names referring to the same canonical entity together. The variations of each entity include acronyms, full name, and informal naming conventions. We propose using search engine results to cluster variations of each entity based on the URLs appearing in those results. We create a cluster C for each top search result returned by querying for the entity e ∈ E assigning e to the cluster C. Our experiments on a manually created dataset shows that our approach achieves higher precision and recall than string matching algorithm and hierarchical clustering based disambiguation methods.