Scaling up feature selection by means of democratization

Authors:
Aida de Haro-García;Nicolás García-Pedrajas
Affiliations:
Department of Computing and Numerical Analysis, University of Córdoba, Spain;Department of Computing and Numerical Analysis, University of Córdoba, Spain
Venue:
IEA/AIE'10 Proceedings of the 23rd international conference on Industrial engineering and other applications of applied intelligent systems - Volume Part II
Year:
2010

Citing 17
Cited 0

A practical approach to feature selection

ML92 Proceedings of the ninth international workshop on Machine learning
Original Contribution: Training a 3-node neural network is NP-complete

Neural Networks
Estimating attributes: analysis and extensions of RELIEF

ECML-94 Proceedings of the European conference on machine learning on Machine Learning
Selection of relevant features and examples in machine learning

Artificial Intelligence - Special issue on relevance
Wrappers for feature subset selection

Artificial Intelligence - Special issue on relevance
Reduction Techniques for Instance-BasedLearning Algorithms

Machine Learning
Feature Extraction, Construction and Selection: A Data Mining Perspective

Feature Extraction, Construction and Selection: A Data Mining Perspective
A Survey of Methods for Scaling Up Inductive Algorithms

Data Mining and Knowledge Discovery
On Issues of Instance Selection

Data Mining and Knowledge Discovery
Database Mining: A Performance Perspective

IEEE Transactions on Knowledge and Data Engineering
Feature selection for high-dimensional genomic microarray data

ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
Feature Subset Selection and Order Identification for Unsupervised Learning

ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
Filters, Wrappers and a Boosting-Based Hybrid for Feature Selection

ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
Feature Selection for Clustering - A Filter Solution

ICDM '02 Proceedings of the 2002 IEEE International Conference on Data Mining
A Branch and Bound Algorithm for Feature Subset Selection

IEEE Transactions on Computers
A review of feature selection techniques in bioinformatics

Bioinformatics
Democratic instance selection: A linear complexity instance selection algorithm based on classifier ensemble concepts

Artificial Intelligence

Quantified Score

Hi-index	0.00

Visualization

Abstract

The overwhelming amount of data that is available nowadays makes many of the existing machine larning algorithms inapplicable to many real-world problems. Two approaches have been used to deal with this problem: scaling up data mining algorithms [1] and data reduction. Nevertheless, scaling up a certain algorithm is not always feasible. One of the most common methods for data reduction is feature selection, but when we face large problems, the scalability becomes an issue. This paper presents a way of removing this difficulty using several rounds of feature selection on subsets of the original dataset, combined using a voting scheme. The performance is very good in terms of testing error and storage reduction, while the execution time of the process is decreased very significantly. The method is especially efficient when we use feature selection algorithms that are of a high computational cost. An extensive comparison in 27 datasets of medium and large sizes from the UCI Machine Learning Repository and using different classifiers shows the usefulness of our method.