A novel feature selection method for large-scale data sets

  • Authors:
  • Wei-Chou Chen;Ming-Chun Yang;Shian-Shyong Tseng

  • Affiliations:
  • Department of Computer and Information Science, National Chiao Tung University, Hsinchu 300, Taiwan. E-mail: sirius@cis.nctu.edu.tw;Department of Computer and Information Science, National Chiao Tung University, Hsinchu 300, Taiwan. E-mail: sirius@cis.nctu.edu.tw;Department of Computer and Information Science, National Chiao Tung University, Hsinchu 300, Taiwan. E-mail: sirius@cis.nctu.edu.tw

  • Venue:
  • Intelligent Data Analysis
  • Year:
  • 2005

Quantified Score

Hi-index 0.00

Visualization

Abstract

Feature selection is about finding useful (relevant) features to describe an application domain. The problem of finding the minimal subsets of features that can describe all of the concepts in the given data set is NP-hard. In the past, we had proposed a feature selection method, which originated from rough set and bitmap indexing techniques, to select the optimal (minimal) feature set for the given data set efficiently. Although our method is sufficient to guarantee a solution's optimality, the computation cost is very high when the number of features is huge. In this paper, we propose a nearly optimal feature selection method, called bitmap-based feature selection method with discernibility matrix, which employs a discernibility matrix to record the important features during the construction of the cleansing tree to reduce the processing time. And the corresponding indexing and selecting algorithms for such feature selection method are also proposed. Finally, some experiments and comparisons are given and the result shows the efficiency and accuracy of our proposed method.