How many different "John Smiths", and who are they?

  • Authors:
  • Anagba Kulkarni;Ted Pedersen

  • Affiliations:
  • Department of Computer Science, University of Minnesota, Duluth, Duluth, MN;Department of Computer Science, University of Minnesota, Duluth, Duluth, MN

  • Venue:
  • AAAI'06 proceedings of the 21st national conference on Artificial intelligence - Volume 2
  • Year:
  • 2006

Quantified Score

Hi-index 0.00

Visualization

Abstract

In this work we propose three unsupervised measures to automatically identify the number of distinct entities a given ambiguous name refers to in a corpus. We experiment with 22 artificially created name conflations and observe that the measure (PK2) formulated as the ratio of two successive clustering criterion function values outperforms the other two measures. We also describe a method to assign a unique label to each discovered cluster so as to identify the underlying entity that it refers to.