Entity-based cross-document coreferencing using the Vector Space Model

  • Authors:
  • Amit Bagga;Breck Baldwin

  • Affiliations:
  • Duke University, Durham, NC;University of Pennsylvania, Philadelphia, PA

  • Venue:
  • COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 1
  • Year:
  • 1998

Quantified Score

Hi-index 0.01

Visualization

Abstract

Cross-document coreference occurs when the same person, place, event, or concept is discussed in more than one text source. Computer recognition of this phenomenon is important because it helps break "the document boundary" by allowing a user to examine information about a particular entity from multiple text sources at the same time. In this paper we describe a cross-document coreference resolution algorithm which uses the Vector Space Model to resolve ambiguities between people having the same name. In addition, we also describe a scoring algorithm for evaluating the cross-document coreference chains produced by our system and we compare our algorithm to the scoring algorithm used in the MUC-6 (within document) coreference task.