Feature selection using hierarchical feature clustering

  • Authors:
  • Huawen Liu;Xindong Wu;Shichao Zhang

  • Affiliations:
  • Zhejiang Normal University, Jinhua, China;University of Vermont, Vermont, USA;University of Technology, Sydney, Sydney, Australia

  • Venue:
  • Proceedings of the 20th ACM international conference on Information and knowledge management
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

One of the challenges in data mining is the dimensionality of data, which is often very high and prevalent in many domains, such as text categorization and bio-informatics. The high-dimensionality of data may bring many adverse situations to traditional learning algorithms. To cope with this issue, feature selection has been put forward. Currently, many efforts have been attempted in this field and lots of feature selection algorithms have been developed. In this paper we propose a new selection method to pick discriminative features by using information measurement. The main characteristic of our selection method is that the selection procedure works like feature clustering in a hierarchically agglomerative way, where each feature is considered as a cluster and the between-cluster and within-cluster distances are measured by mutual information and the coefficient of relevancy respectively. Consequently, the final aggregated cluster is the selection result, which has the minimal redundancy among its members and the maximal relevancy with the class labels. The simulation experiments on seven datasets show that the proposed method outperforms other popular feature selection algorithms in classification performance.