Data mining: practical machine learning tools and techniques with Java implementations
Data mining: practical machine learning tools and techniques with Java implementations
Multiple Comparisons in Induction Algorithms
Machine Learning
Linkage and Autocorrelation Cause Feature Selection Bias in Relational Learning
ICML '02 Proceedings of the Nineteenth International Conference on Machine Learning
Inference for the Generalization Error
Machine Learning
An introduction to variable and feature selection
The Journal of Machine Learning Research
An extensive empirical study of feature selection metrics for text classification
The Journal of Machine Learning Research
Pattern Classification (2nd Edition)
Pattern Classification (2nd Edition)
Learning and evaluating classifiers under sample selection bias
ICML '04 Proceedings of the twenty-first international conference on Machine learning
Toward Integrating Feature Selection Algorithms for Classification and Clustering
IEEE Transactions on Knowledge and Data Engineering
A Stochastic Algorithm for Feature Selection in Pattern Recognition
The Journal of Machine Learning Research
The peaking phenomenon in the presence of feature-selection
Pattern Recognition Letters
Scalable Feature Selection for Multi-class Problems
ECML PKDD '08 Proceedings of the 2008 European Conference on Machine Learning and Knowledge Discovery in Databases - Part I
Hybridization of Evolutionary Mechanisms for Feature Subset Selection in Unsupervised Learning
MICAI '09 Proceedings of the 8th Mexican International Conference on Artificial Intelligence
RSFDGrC '09 Proceedings of the 12th International Conference on Rough Sets, Fuzzy Sets, Data Mining and Granular Computing
Expert Systems with Applications: An International Journal
Review Article: Biometric personal authentication using keystroke dynamics: A review
Applied Soft Computing
Feature selection for high-dimensional imbalanced data
Neurocomputing
Feature selections for authorship attribution
Proceedings of the 28th Annual ACM Symposium on Applied Computing
Hi-index | 0.00 |
Feature selection is often applied to high-dimensional data prior to classification learning. Using the same training dataset in both selection and learning can result in so-called feature subset selection bias. This bias putatively can exacerbate data over-fitting and negatively affect classification performance. However, in current practice separate datasets are seldom employed for selection and learning, because dividing the training data into two datasets for feature selection and classifier learning respectively reduces the amount of data that can be used in either task. This work attempts to address this dilemma. We formalize selection bias for classification learning, analyze its statistical properties, and study factors that affect selection bias, as well as how the bias impacts classification learning via various experiments. This research endeavors to provide illustration and explanation why the bias may not cause negative impact in classification as much as expected in regression.