Discriminative training of clustering functions: theory and experiments with entity identification

  • Authors:
  • Xin Li;Dan Roth

  • Affiliations:
  • University of Illinois, Urbana, IL;University of Illinois, Urbana, IL

  • Venue:
  • CONLL '05 Proceedings of the Ninth Conference on Computational Natural Language Learning
  • Year:
  • 2005

Quantified Score

Hi-index 0.00

Visualization

Abstract

Clustering is an optimization procedure that partitions a set of elements to optimize some criteria, based on a fixed distance metric defined between the elements. Clustering approaches have been widely applied in natural language processing and it has been shown repeatedly that their success depends on defining a good distance metric, one that is appropriate for the task and the clustering algorithm used. This paper develops a framework in which clustering is viewed as a learning task, and proposes a way to train a distance metric that is appropriate for the chosen clustering algorithm in the context of the given task. Experiments in the context of the entity identification problem exhibit significant performance improvements over state-of-the-art clustering approaches developed for this problem.