Scaling up feature selection by means of democratization

  • Authors:
  • Aida de Haro-García;Nicolás García-Pedrajas

  • Affiliations:
  • Department of Computing and Numerical Analysis, University of Córdoba, Spain;Department of Computing and Numerical Analysis, University of Córdoba, Spain

  • Venue:
  • IEA/AIE'10 Proceedings of the 23rd international conference on Industrial engineering and other applications of applied intelligent systems - Volume Part II
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

The overwhelming amount of data that is available nowadays makes many of the existing machine larning algorithms inapplicable to many real-world problems. Two approaches have been used to deal with this problem: scaling up data mining algorithms [1] and data reduction. Nevertheless, scaling up a certain algorithm is not always feasible. One of the most common methods for data reduction is feature selection, but when we face large problems, the scalability becomes an issue. This paper presents a way of removing this difficulty using several rounds of feature selection on subsets of the original dataset, combined using a voting scheme. The performance is very good in terms of testing error and storage reduction, while the execution time of the process is decreased very significantly. The method is especially efficient when we use feature selection algorithms that are of a high computational cost. An extensive comparison in 27 datasets of medium and large sizes from the UCI Machine Learning Repository and using different classifiers shows the usefulness of our method.