Estimating attributes: analysis and extensions of RELIEF
ECML-94 Proceedings of the European conference on machine learning on Machine Learning
Floating search methods in feature selection
Pattern Recognition Letters
Feature Selection: Evaluation, Application, and Small Sample Performance
IEEE Transactions on Pattern Analysis and Machine Intelligence
Classifier-Independent Feature Selection For Two-Stage Feature Selection
SSPR '98/SPR '98 Proceedings of the Joint IAPR International Workshops on Advances in Pattern Recognition
A Branch and Bound Algorithm for Computing k-Nearest Neighbors
IEEE Transactions on Computers
Decorrelation of the true and estimated classifier errors in high-dimensional settings
EURASIP Journal on Bioinformatics and Systems Biology
A review of feature selection techniques in bioinformatics
Bioinformatics
Impact of error estimation on feature selection
Pattern Recognition
Feature selection algorithms to find strong genes
Pattern Recognition Letters
The feature selection problem: traditional methods and a new algorithm
AAAI'92 Proceedings of the tenth national conference on Artificial intelligence
A Problem of Dimensionality: A Simple Example
IEEE Transactions on Pattern Analysis and Machine Intelligence
On the mean accuracy of statistical pattern recognizers
IEEE Transactions on Information Theory
Is bagging effective in the classification of small-sample genomic and proteomic data?
EURASIP Journal on Bioinformatics and Systems Biology - Special issue on applications of signal procesing techniques to bioinformatics, genomics, and proteomics
Incremental Bayesian Network Learning for Scalable Feature Selection
IDA '09 Proceedings of the 8th International Symposium on Intelligent Data Analysis: Advances in Intelligent Data Analysis VIII
Ensemble gene selection by grouping for microarray data classification
Journal of Biomedical Informatics
Bagging Constraint Score for feature selection with pairwise constraints
Pattern Recognition
EURASIP Journal on Bioinformatics and Systems Biology
Quadratic Programming Feature Selection
The Journal of Machine Learning Research
Expert Systems with Applications: An International Journal
Detection of phenotypes in microarray data using force-directed placement transforms
MLDM'11 Proceedings of the 7th international conference on Machine learning and data mining in pattern recognition
PRIB'11 Proceedings of the 6th IAPR international conference on Pattern recognition in bioinformatics
Proceedings of the 2nd ACM Conference on Bioinformatics, Computational Biology and Biomedicine
Feature evaluation and selection with cooperative game theory
Pattern Recognition
A semi-supervised feature ranking method with ensemble learning
Pattern Recognition Letters
ReinSel: A class-based mechanism for feature selection in ensemble of classifiers
Applied Soft Computing
Feature selection using dynamic weights for classification
Knowledge-Based Systems
Environmental Modelling & Software
Multiple gene sets for cancer classification using gene range selection based on random forest
ACIIDS'13 Proceedings of the 5th Asian conference on Intelligent Information and Database Systems - Volume Part I
An ensemble of SVM classifiers based on gene pairs
Computers in Biology and Medicine
Multiclass Gene Selection Using Pareto-Fronts
IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Filter-based optimization techniques for selection of feature subsets in ensemble systems
Expert Systems with Applications: An International Journal
On selecting interacting features from high-dimensional data
Computational Statistics & Data Analysis
MaskedPainter: Feature selection for microarray data analysis
Intelligent Data Analysis
Hi-index | 0.01 |
Contemporary biological technologies produce extremely high-dimensional data sets from which to design classifiers, with 20,000 or more potential features being common place. In addition, sample sizes tend to be small. In such settings, feature selection is an inevitable part of classifier design. Heretofore, there have been a number of comparative studies for feature selection, but they have either considered settings with much smaller dimensionality than those occurring in current bioinformatics applications or constrained their study to a few real data sets. This study compares some basic feature-selection methods in settings involving thousands of features, using both model-based synthetic data and real data. It defines distribution models involving different numbers of markers (useful features) versus non-markers (useless features) and different kinds of relations among the features. Under this framework, it evaluates the performances of feature-selection algorithms for different distribution models and classifiers. Both classification error and the number of discovered markers are computed. Although the results clearly show that none of the considered feature-selection methods performs best across all scenarios, there are some general trends relative to sample size and relations among the features. For instance, the classifier-independent univariate filter methods have similar trends. Filter methods such as the t-test have better or similar performance with wrapper methods for harder problems. This improved performance is usually accompanied with significant peaking. Wrapper methods have better performance when the sample size is sufficiently large. ReliefF, the classifier-independent multivariate filter method, has worse performance than univariate filter methods in most cases; however, ReliefF-based wrapper methods show performance similar to their t-test-based counterparts.