Optimization-based feature selection with adaptive instance sampling

  • Authors:
  • Jaekyung Yang;Sigurdur Olafsson

  • Affiliations:
  • Department of Industrial and Manufacturing Systems Engineering, Iowa State University, Ames, IA;Department of Industrial and Manufacturing Systems Engineering, Iowa State University, Ames, IA

  • Venue:
  • Computers and Operations Research
  • Year:
  • 2006

Quantified Score

Hi-index 0.03

Visualization

Abstract

Preprocessing the data to filter out redundant and irrelevant features is one of the most important steps in the data mining process. Careful feature selection may improve both the computational time of inducing subsequent models and the quality of those models. Using fewer features often leads to simpler and easier to interpret models, and selecting important feature can lead to important insights into the application. The feature selection problem is inherently a combinatorial optimization problem. This paper builds on a metaheuristic called the nested partitions method that has been shown to be particularly effective for the feature selection problem. Specifically, we focus on the scalability of the method and show that its performance is vastly improved by incorporating random sampling of instances. Furthermore, we develop an adaptive variant of the algorithm that dynamically determines the required sample rate. The adaptive algorithm is shown to perform very well when applied to a set of standard test problems.