Efficiently handling feature redundancy in high-dimensional data

  • Authors:
  • Lei Yu;Huan Liu

  • Affiliations:
  • Arizona State University, Tempe, AZ;Arizona State University, Tempe, AZ

  • Venue:
  • Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
  • Year:
  • 2003

Quantified Score

Hi-index 0.00

Visualization

Abstract

High-dimensional data poses a severe challenge for data mining. Feature selection is a frequently used technique in pre-processing high-dimensional data for successful data mining. Traditionally, feature selection is focused on removing irrelevant features. However, for high-dimensional data, removing redundant features is equally critical. In this paper, we provide a study of feature redundancy in high-dimensional data and propose a novel correlation-based approach to feature selection within the filter model. The extensive empirical study using real-world data shows that the proposed approach is efficient and effective in removing redundant and irrelevant features.