Unsupervised personal name disambiguation

  • Authors:
  • Gideon S. Mann;David Yarowsky

  • Affiliations:
  • Johns Hopkins University, Baltimore, MD;Johns Hopkins University, Baltimore, MD

  • Venue:
  • CONLL '03 Proceedings of the seventh conference on Natural language learning at HLT-NAACL 2003 - Volume 4
  • Year:
  • 2003

Quantified Score

Hi-index 0.01

Visualization

Abstract

This paper presents a set of algorithms for distinguishing personal names with multiple real referents in text, based on little or no supervision. The approach utilizes an unsupervised clustering technique over a rich feature space of biographic facts, which are automatically extracted via a language-independent bootstrapping process. The induced clustering of named entities are then partitioned and linked to their real referents via the automatically extracted biographic data. Performance is evaluated based on both a test set of handlabeled multi-referent personal names and via automatically generated pseudonames.