Feature Selection for Clustering - A Filter Solution

  • Authors:
  • Manoranjan Dash;Kiseok Choi;Peter Scheuermann;Huan Liu

  • Affiliations:
  • -;-;-;-

  • Venue:
  • ICDM '02 Proceedings of the 2002 IEEE International Conference on Data Mining
  • Year:
  • 2002

Quantified Score

Hi-index 0.01

Visualization

Abstract

Processing applications with a large number of dimensionshas been a challenge to the KDD community. Featureselection, an effective dimensionality reduction technique,is an essential pre-processing method to remove noisy features.In the literature there are only a few methods proposedfor feature selection for clustering. And, almost all ofthose methods are wrapper' techniques that require a clusteringalgorithm to evaluate the candidate feature subsets.The wrapper approach is largely unsuitable in real-worldapplications due to its heavy reliance on clustering algorithmsthat require parameters such as number of clusters,and due to lack of suitable clustering criteria to evaluateclustering in different subspaces. In this paper we proposea filter' method that is independent of any clustering algorithm.The proposed method is based on the observationthat data with clusters has very different point-to-point distancehistogram than that of data without clusters. Usingthis we propose an entropy measure that is low if data hasdistinct clusters and high otherwise. The entropy measure issuitable for selecting the most important subset of featuresbecause it is invariant with number of dimensions, and isaffected only by the quality of clustering. Extensive performanceevaluation over synthetic, benchmark, and realdatasets shows its effectiveness.