Machine Learning
Machine Learning
An introduction to variable and feature selection
The Journal of Machine Learning Research
Variable selection using svm based criteria
The Journal of Machine Learning Research
IEEE Transactions on Pattern Analysis and Machine Intelligence
Knowledge mining with genetic programming methods for variable selection in flavor design
Proceedings of the 12th annual conference on Genetic and evolutionary computation
Symbolic regression using nearest neighbor indexing
Proceedings of the 12th annual conference companion on Genetic and evolutionary computation
Variable selection using random forests
Pattern Recognition Letters
Effects of constant optimization by nonlinear least squares minimization in symbolic regression
Proceedings of the 15th annual conference companion on Genetic and evolutionary computation
Hi-index | 0.00 |
Feature selection in high-dimensional data sets is an open problem with no universal satisfactory method available. In this paper we discuss the requirements for such a method with respect to the various aspects of feature importance and explore them using regression random forests and symbolic regression. We study 'conventional' feature selection with both methods on several test problems and a case study, compare the results, and identify the conceptual differences in generated feature importances. We demonstrate that random forests might overlook important variables (significantly related to the response) for various reasons, while symbolic regression identifies all important variables if models of sufficient quality are found. We explain the results by the fact that variable importances obtained by these methods have different semantics.