Pairwise Constrained Clustering for Sparse and High Dimensional Feature Spaces

  • Authors:
  • Su Yan;Hai Wang;Dongwon Lee;C. Lee Giles

  • Affiliations:
  • College of Information Sciences and Technology, The Pennsylvania State University, and University Park, USA PA 16802;College of Information Sciences and Technology, The Pennsylvania State University, and Dumore, USA PA 18512;College of Information Sciences and Technology, The Pennsylvania State University, and University Park, USA PA 16802;College of Information Sciences and Technology, The Pennsylvania State University, and University Park, USA PA 16802

  • Venue:
  • PAKDD '09 Proceedings of the 13th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

Clustering high dimensional data with sparse features is challenging because pairwise distances between data items are not informative in high dimensional space. To address this challenge, we propose two novel semi-supervised clustering methods that incorporate prior knowledge in the form of pairwise cluster membership constraints. In particular, we project high-dimensional data onto a much reduced-dimension subspace, where rough clustering structure defined by the prior knowledge is strengthened. Metric learning is then performed on the subspace to construct more informative pairwise distances. We also propose to propagate constraints locally to improve the informativeness of pairwise distances. When the new methods are evaluated using two real benchmark data sets, they show substantial improvement using only limited prior knowledge.