Dependency-based feature selection for clustering symbolic data

  • Authors:
  • Luis Talavera

  • Affiliations:
  • Departament de Llenguatges i Sistemes Inform\'`atics, Universitat Polit\'`ecnica de Catalunya, Campus Nord, M\'`odul C6, Jordi Girona 1-3, 08034 Barcelona, Spain. E-mail: talavera@lsi.upc.es/ URL: ht ...

  • Venue:
  • Intelligent Data Analysis
  • Year:
  • 2000

Quantified Score

Hi-index 0.00

Visualization

Abstract

Feature selection is a central problem in data analysis that have received a significant amount of attention from several disciplines, such as machine learning or pattern recognition. However, most of the research has been addressed towards supervised tasks, paying little attention to unsupervised learning. In this paper, we introduce an unsupervised feature selection method for symbolic clustering tasks. Our method is based upon the assumption that, in the absence of class labels, we can deem as irrelevant those features that exhibit low dependencies with the rest of features. Experiments with several data sets demonstrate that the proposed approach is able to detect completely irrelevant features and that, additionally, it removes other features without significantly hurting the performance of the clustering algorithm.