The nature of statistical learning theory
The nature of statistical learning theory
Wrappers for feature subset selection
Artificial Intelligence - Special issue on relevance
Advances in Large Margin Classifiers
Advances in Large Margin Classifiers
An introduction to variable and feature selection
The Journal of Machine Learning Research
Time and sample efficient discovery of Markov blankets and direct causal relations
Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Sparseness of support vector machines
The Journal of Machine Learning Research
Strong completeness and faithfulness in Bayesian networks
UAI'95 Proceedings of the Eleventh conference on Uncertainty in artificial intelligence
LDA/SVM driven nearest neighbor classification
IEEE Transactions on Neural Networks
OCFS: optimal orthogonal centroid feature selection for text categorization
Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
Consistent Feature Selection for Pattern Recognition in Polynomial Time
The Journal of Machine Learning Research
Artificial Intelligence in Medicine
Derivative reproducing properties for kernel methods in learning theory
Journal of Computational and Applied Mathematics
Using Markov Blankets for Causal Structure Learning
The Journal of Machine Learning Research
Parzen windows for multi-class classification
Journal of Complexity
SVM Based Decision Analysis and Its Granular-Based Solving
ICCSA '09 Proceedings of the International Conference on Computational Science and Its Applications: Part II
Avoidance of model re-induction in SVM-based feature selection for text categorization
IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence
Gradient learning in a classification setting by gradient descent
Journal of Approximation Theory
Hermite learning with gradient data
Journal of Computational and Applied Mathematics
Classification with Gaussians and Convex Loss
The Journal of Machine Learning Research
Online Learning with Samples Drawn from Non-identical Distributions
The Journal of Machine Learning Research
The Journal of Machine Learning Research
The Journal of Machine Learning Research
Conditional quantiles with varying Gaussians
Advances in Computational Mathematics
Intelligent Data Analysis
Hi-index | 0.00 |
Most prevalent techniques in Support Vector Machine (SVM) feature selection are based on the intuition that the weights of features that are close to zero are not required for optimal classification. In this paper we show that indeed, in the sample limit, the irrelevant variables (in a theoretical and optimal sense) will be given zero weight by a linear SVM, both in the soft and the hard margin case. However, SVM-based methods have certain theoretical disadvantages too. We present examples where the linear SVM may assign zero weights to strongly relevant variables (i.e., variables required for optimal estimation of the distribution of the target variable) and where weakly relevant features (i.e., features that are superfluous for optimal feature selection given other features) may get non-zero weights. We contrast and theoretically compare with Markov-Blanket based feature selection algorithms that do not have such disadvantages in a broad class of distributions and could also be used for causal discovery.