Adaptation in natural and artificial systems
Adaptation in natural and artificial systems
Estimating attributes: analysis and extensions of RELIEF
ECML-94 Proceedings of the European conference on machine learning on Machine Learning
Feature Selection: Evaluation, Application, and Small Sample Performance
IEEE Transactions on Pattern Analysis and Machine Intelligence
Wrappers for feature subset selection
Artificial Intelligence - Special issue on relevance
Intelligence through simulated evolution: forty years of evolutionary programming
Intelligence through simulated evolution: forty years of evolutionary programming
Genetic Algorithms in Search, Optimization and Machine Learning
Genetic Algorithms in Search, Optimization and Machine Learning
Numerical Optimization of Computer Models
Numerical Optimization of Computer Models
Processing large-scale multi-dimensional data in parallel and distributed environments
Parallel Computing - Parallel data-intensive algorithms and applications
An introduction to variable and feature selection
The Journal of Machine Learning Research
An extensive empirical study of feature selection metrics for text classification
The Journal of Machine Learning Research
A General Framework of Feature Selection for Text Categorization
MLDM '09 Proceedings of the 6th International Conference on Machine Learning and Data Mining in Pattern Recognition
Hi-index | 0.00 |
Feature selection is often found to be an essential pre-processing step when data mining is applied to many-attribute datasets (e.g. several hundred or thousands of attributes). Feature selection aims to pre-select a relatively small number of attributes, thus speeding up further processing and (hopefully) eliminating data that have minimal or no discriminatory power. Often, feature selection is done on the basis of the straightforward statistical correlation, discarding features that have the lowest correlation with the target class(es). However, when these correlation values tend to be rather low for all features (common in many datasets of importance), the basis for pre-selection of any specific set of features is undermined, and straightforward feature selection may do more harm than good. We confirm this by investigating the performance of five feature selection strategies on several datasets with varying overall correlation values, finding that statistical correlation is never the best choice for poorly correlated data. The most reliable methods among those tested are either no feature selection, or Evolutionary Algorithm feature selection.