Constraint-based clustering in large databases

  • Authors:
  • Anthony K. H. Tung;Raymond T. Ng;Laks V. S. Lakshmanan;Jiawei Han

  • Affiliations:
  • -;-;-;-

  • Venue:
  • ICDT '01 Proceedings of the 8th International Conference on Database Theory
  • Year:
  • 2001

Quantified Score

Hi-index 0.00

Visualization

Abstract

Constrained clustering--finding clusters that satisfy user-specified constraints--is highly desirable in many applications. In this paper, we introduce the constrained clustering problem and show that traditional clustering algorithms (e.g., k-means) cannot handle it. A scalable constraint-clustering algorithm is developed in this study which starts by finding an initial solution that satisfies user-specified constraints and then refines the solution by performing confined object movements under constraints. Our algorithm consists of two phases: pivot movement and deadlock resolution. For both phases, we show that finding the optimal solution is NP-hard. We then propose several heuristics and show how our algorithm can scale up for large data sets using the heuristic of micro-cluster sharing. By experiments, we show the effectiveness and efficiency of the heuristics.