Identifying co-referential names across large corpora

  • Authors:
  • Levon Lloyd;Andrew Mehler;Steven Skiena

  • Affiliations:
  • Department of Computer Science, State University of New York at Stony Brook, Stony Brook, NY;Department of Computer Science, State University of New York at Stony Brook, Stony Brook, NY;Department of Computer Science, State University of New York at Stony Brook, Stony Brook, NY

  • Venue:
  • CPM'06 Proceedings of the 17th Annual conference on Combinatorial Pattern Matching
  • Year:
  • 2006

Quantified Score

Hi-index 0.00

Visualization

Abstract

A single logical entity can be referred to by several different names over a large text corpus. We present our algorithm for finding all such co-reference sets in a large corpus. Our algorithm involves three steps: morphological similarity detection, contextual similarity analysis, and clustering. Finally, we present experimental results on over large corpus of real news text to analyze the performance our techniques.