Names: a new frontier in text mining

  • Authors:
  • Frankie Patman;Paul Thompson

  • Affiliations:
  • Language Analysis Systems, Inc., Herndon, VA;Institute for Security Technology Studies, Dartmouth College, Hanover, NH

  • Venue:
  • ISI'03 Proceedings of the 1st NSF/NIJ conference on Intelligence and security informatics
  • Year:
  • 2003

Quantified Score

Hi-index 0.00

Visualization

Abstract

Over the past 15 years the government has funded research in information extraction, with the goal of developing the technology to extract entities, events, and their interrelationships from free text for further analysis. A crucial component of linking entities across documents is the ability to recognize when different name strings are potential references to the same entity. Given the extraordinary range of variation international names can take when rendered in the Roman alphabet, this is a daunting task. This paper surveys existing technologies for name matching and for accomplishing pieces of the cross-document extraction and linking task. It proposes a direction for future work in which existing entity extraction, coreference, and database name matching technologies would be harnessed for cross-document coreference and linking capabilities. The extension of name variant matching to free text will add important text mining functionality for intelligence and security informatics toolkits.