High-performing feature selection for text classification
Proceedings of the eleventh international conference on Information and knowledge management
Feature Subset Selection in Text-Learning
ECML '98 Proceedings of the 10th European Conference on Machine Learning
Feature selection for high-dimensional genomic microarray data
ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
Variable selection using svm based criteria
The Journal of Machine Learning Research
Use of the zero norm with linear models and kernel methods
The Journal of Machine Learning Research
Cloud score for feature selection
FSKD'09 Proceedings of the 6th international conference on Fuzzy systems and knowledge discovery - Volume 3
Hi-index | 0.00 |
Finding most predictive features for statistical classification is a challenging problem and has important applications. Support Vector Machines (SVMs), for example, have been found successful with a recursive procedure in selecting most important genes for cancer prediction. It is not well understood, however, how much the success depends on the choice of the classifier, and how much on the recursive procedure. We answer this question by examining multiple classifiers (SVM, ridge regression and Rocchio) with feature selection in recursive and non-recursive settings, on a DNA microarray dataset (AMLALL) and a text categorization benchmark (Reuters-21578). We found recursive ridge regression most effective: its best classification performance (zero error) on the AMLALL dataset was obtained when using only 3 genes (selected from over 7000), which is more impressive than the best published result on the same benchmark - zero error of recursive SVM using 8 genes. On Reuters-21578, recursive ridge regression also achieves the best result ever published (the improvement was verified in a significance test). An in-depth analysis of the experimental results shows that the choice of classifier heavily influences the recursive feature selection process: the ridge regression classifier tends to penalize redundant features to a much larger extent than the SVM does.