Robust projected clustering

  • Authors:
  • Gabriela Moise;Jörg Sander;Martin Ester

  • Affiliations:
  • University of Alberta, Department of Computing Science, T6G 2E8, Edmonton, AB, Canada;University of Alberta, Department of Computing Science, T6G 2E8, Edmonton, AB, Canada;Simon Fraser University, School of Computing Science, T6G 2E8, Burnaby, BC, Canada

  • Venue:
  • Knowledge and Information Systems
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

Projected clustering partitions a data set into several disjoint clusters, plus outliers, so that each cluster exists in a subspace. Subspace clustering enumerates clusters of objects in all subspaces of a data set, and it tends to produce many overlapping clusters. Such algorithms have been extensively studied for numerical data, but only a few have been proposed for categorical data. Typical drawbacks of existing projected and subspace clustering algorithms for numerical or categorical data are that they rely on parameters whose appropriate values are difficult to set appropriately or that they are unable to identify projected clusters with few relevant attributes. We present P3C, a robust algorithm for projected clustering that can effectively discover projected clusters in the data while minimizing the number of required parameters. P3C does not need the number of projected clusters as input, and can discover, under very general conditions, the true number of projected clusters. P3C is effective in detecting very low-dimensional projected clusters embedded in high dimensional spaces. P3C positions itself between projected and subspace clustering in that it can compute both disjoint or overlapping clusters. P3C is the first projected clustering algorithm for both numerical and categorical data.