Iteratively Selecting Feature Subsets for Mining from High-Dimensional Databases

  • Authors:
  • Hiroshi Mamitsuka

  • Affiliations:
  • -

  • Venue:
  • PKDD '02 Proceedings of the 6th European Conference on Principles of Data Mining and Knowledge Discovery
  • Year:
  • 2002

Quantified Score

Hi-index 0.00

Visualization

Abstract

We propose a new data mining method that is effective for mining from extremely high-dimensional databases. Our proposed method iteratively selects a subset of features from a database and builds a hypothesis with the subset. Our selection of a feature subset has two steps, i.e. selecting a subset of instances from the database, to which predictions by multiple hypotheses previously obtained are most unreliable, and then selecting a subset of features, the distribution of whose values in the selected instances varies the most from that in all instances of the database. We empirically evaluate the effectiveness of the proposed method by comparing its performance with those of two other methods, including Xing et al.'s one of the latest feature subset selection methods. The evaluation was performed on a real-world data set with approximately 140,000 features. Our results show that the performance of the proposed method exceeds those of the other methods, both in terms of the final predictive accuracy and the precision attained at a recall given by Xing et al.'s method. We have also examined the effect of noise in the data and found that the advantage of the proposed method becomes more pronounced for larger noise levels.