The feature selection problem: traditional methods and a new algorithm

  • Authors:
  • Kenji Kira;Larry A. Rendell

  • Affiliations:
  • Computer & Information Systems Laboratory, Mitsubishi Electric Corporation, Kanagawa, Japan;Beckman Institute and Department of Computer Science, University of Illinois at Urbana-Champaign, Urbana, IL

  • Venue:
  • AAAI'92 Proceedings of the tenth national conference on Artificial intelligence
  • Year:
  • 1992

Quantified Score

Hi-index 0.01

Visualization

Abstract

For real-world concept learning problems, feature selection is important to speed up learning and to improve concept quality. We review and analyze past approaches to feature selection and note their strengths and weaknesses. We then introduce and theoretically examine a new algorithm Rellef which selects relevant features using a statistical method. Relief does not depend on heuristics, is accurate even if features interact, and is noise-tolerant. It requires only linear time in the number of given features and the number of training instances, regardless of the target concept complexity. The algorithm also has certain limitations such as nonoptimal feature set size. Ways to overcome the limitations are suggested. We also report the test results of comparison between Relief and other feature selection algorithms. The empirical results support the theoretical analysis, suggesting a practical approach to feature selection for real-world problems.