Profile based cross-document coreference using kernelized fuzzy relational clustering

  • Authors:
  • Jian Huang;Sarah M. Taylor;Jonathan L. Smith;Konstantinos A. Fotiadis;C. Lee Giles

  • Affiliations:
  • Pennsylvania State University, University Park, PA;Advanced Technology Office, Lockheed Martin IS&GS, Arlington, VA;Advanced Technology Office, Lockheed Martin IS&GS, Arlington, VA;Advanced Technology Office, Lockheed Martin IS&GS, Arlington, VA;Pennsylvania State University, University Park, PA

  • Venue:
  • ACL '09 Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 1 - Volume 1
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

Coreferencing entities across documents in a large corpus enables advanced document understanding tasks such as question answering. This paper presents a novel cross document coreference approach that leverages the profiles of entities which are constructed by using information extraction tools and reconciled by using a within-document coreference module. We propose to match the profiles by using a learned ensemble distance function comprised of a suite of similarity specialists. We develop a kernelized soft relational clustering algorithm that makes use of the learned distance function to partition the entities into fuzzy sets of identities. We compare the kernelized clustering method with a popular fuzzy relation clustering algorithm (FRC) and show 5% improvement in coreference performance. Evaluation of our proposed methods on a large benchmark disambiguation collection shows that they compare favorably with the top runs in the SemEval evaluation.