Clustering with relative constraints

  • Authors:
  • Eric Yi Liu;Zhaojun Zhang;Wei Wang

  • Affiliations:
  • University of North Carolina at Chapel Hill, Chapel Hill, NC, USA;University of North Carolina at Chapel Hill, Chapel Hill, NC, USA;University of North Carolina at Chapel Hill, Chapel Hill, NC, USA

  • Venue:
  • Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

Recent studies have suggested using relative distance comparisons as constraints to represent domain knowledge. A natural extension to relative comparisons is the combination of two comparisons defined on the same set of three instances. Constraints in this form, termed Relative Constraints, provide a unified knowledge representation for both partitional and hierarchical clusterings. But many key properties of relative constraints remain unknown. In this paper, we answer the following important questions that enable the broader application of relative constraints in general clustering problems: " Feasibility: Does there exist a clustering that satisfies a given set of relative constraints? (consistency of constraints) "Completeness: Given a set of consistent relative constraints, how can one derive a complete clustering without running into dead-ends? " Informativeness: How can one extract the most informative relative constraints from given knowledge sources? We show that any hierarchical domain knowledge can be easily represented by relative constraints. We further present a hierarchical algorithm that finds a clustering satisfying all given constraints in polynomial time. Experiments showed that our algorithm achieves significantly higher accuracy than the existing metric learning approach based on relative comparisons.