A Statistical-Heuristic Feature Selection Criterion for Decision Tree Induction
IEEE Transactions on Pattern Analysis and Machine Intelligence
A practical approach to feature selection
ML92 Proceedings of the ninth international workshop on Machine learning
C4.5: programs for machine learning
C4.5: programs for machine learning
Estimating attributes: analysis and extensions of RELIEF
ECML-94 Proceedings of the European conference on machine learning on Machine Learning
Better subset regression using the nonnegative garrote
Technometrics
Selection of relevant features and examples in machine learning
Artificial Intelligence - Special issue on relevance
Wrappers for feature subset selection
Artificial Intelligence - Special issue on relevance
Support vector regression with ANOVA decomposition kernels
Advances in kernel methods
Least Squares Support Vector Machine Classifiers
Neural Processing Letters
Genetic Algorithms in Search, Optimization and Machine Learning
Genetic Algorithms in Search, Optimization and Machine Learning
Structural Modelling with Sparse Kernels
Machine Learning
Symbolic Interpretation of Artificial Neural Networks
IEEE Transactions on Knowledge and Data Engineering
Feature Selection via Concave Minimization and Support Vector Machines
ICML '98 Proceedings of the Fifteenth International Conference on Machine Learning
An introduction to variable and feature selection
The Journal of Machine Learning Research
Variable selection using svm based criteria
The Journal of Machine Learning Research
Use of the zero norm with linear models and kernel methods
The Journal of Machine Learning Research
Combined SVM-Based Feature Selection and Classification
Machine Learning
Regression Modeling Strategies
Regression Modeling Strategies
Iterative RELIEF for Feature Weighting: Algorithms, Theories, and Applications
IEEE Transactions on Pattern Analysis and Machine Intelligence
Feature Selection for Nonlinear Kernel Support Vector Machines
ICDMW '07 Proceedings of the Seventh IEEE International Conference on Data Mining Workshops
IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics
Learning with many irrelevant features
AAAI'91 Proceedings of the ninth National conference on Artificial intelligence - Volume 2
Simultaneous feature selection and classification using kernel-penalized support vector machines
Information Sciences: an International Journal
Probability density estimation from optimally condensed data samples
IEEE Transactions on Pattern Analysis and Machine Intelligence
IEEE Transactions on Neural Networks
Hi-index | 0.00 |
Objective: To propose a new flexible and sparse classifier that results in interpretable decision support systems. Methods: Support vector machines (SVMs) for classification are very powerful methods to obtain classifiers for complex problems. Although the performance of these methods is consistently high and non-linearities and interactions between variables can be handled efficiently when using non-linear kernels such as the radial basis function (RBF) kernel, their use in domains where interpretability is an issue is hampered by their lack of transparency. Many feature selection algorithms have been developed to allow for some interpretation but the impact of the different input variables on the prediction still remains unclear. Alternative models using additive kernels are restricted to main effects, reducing their usefulness in many applications. This paper proposes a new approach to expand the RBF kernel into interpretable and visualizable components, including main and two-way interaction effects. In order to obtain a sparse model representation, an iterative l"1-regularized parametric model using the interpretable components as inputs is proposed. Results: Results on toy problems illustrate the ability of the method to select the correct contributions and an improved performance over standard RBF classifiers in the presence of irrelevant input variables. For a 10-dimensional x-or problem, an SVM using the standard RBF kernel obtains an area under the receiver operating characteristic curve (AUC) of 0.947, whereas the proposed method achieves an AUC of 0.997. The latter additionally identifies the relevant components. In a second 10-dimensional artificial problem, the underlying class probability follows a logistic regression model. An SVM with the RBF kernel results in an AUC of 0.975, as apposed to 0.994 for the presented method. The proposed method is applied to two benchmark datasets: the Pima Indian diabetes and the Wisconsin Breast Cancer dataset. The AUC is in both cases comparable to those of the standard method (0.826 versus 0.826 and 0.990 versus 0.996) and those reported in the literature. The selected components are consistent with different approaches reported in other work. However, this method is able to visualize the effect of each of the components, allowing for interpretation of the learned logic by experts in the application domain. Conclusions: This work proposes a new method to obtain flexible and sparse risk prediction models. The proposed method performs as well as a support vector machine using the standard RBF kernel, but has the additional advantage that the resulting model can be interpreted by experts in the application domain.