Selection of relevant features and examples in machine learning
Artificial Intelligence - Special issue on relevance
A Tutorial on Support Vector Machines for Pattern Recognition
Data Mining and Knowledge Discovery
A selective sampling approach to active feature selection
Artificial Intelligence
IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
LS Bound based gene selection for DNA microarray data
Bioinformatics
The impact of sample reduction on PCA-based feature extraction for supervised learning
Proceedings of the 2006 ACM symposium on Applied computing
An introduction to ROC analysis
Pattern Recognition Letters - Special issue: ROC analysis in pattern recognition
The peaking phenomenon in the presence of feature-selection
Pattern Recognition Letters
Support Vector Based T-Score for Gene Ranking
PRIB '08 Proceedings of the Third IAPR International Conference on Pattern Recognition in Bioinformatics
Computational Statistics & Data Analysis
Gene selection from microarray data for cancer classification-a machine learning approach
Computational Biology and Chemistry
Relevant, irredundant feature selection and noisy example elimination
IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics
A framework of gene subset selection using multiobjective evolutionary algorithm
PRIB'12 Proceedings of the 7th IAPR international conference on Pattern Recognition in Bioinformatics
Diagnose the premalignant pancreatic cancer using high dimensional linear machine
PRIB'12 Proceedings of the 7th IAPR international conference on Pattern Recognition in Bioinformatics
Nonnegative Least-Squares Methods for the Classification of High-Dimensional Biological Data
IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Simultaneous sample and gene selection using t-score and approximate support vectors
PRIB'13 Proceedings of the 8th IAPR international conference on Pattern Recognition in Bioinformatics
Hi-index | 0.01 |
T-statistic is widely used for gene ranking in the analysis of microarray gene expressions. Such a filter based criterion is generally computed using all the training samples, all of which, however, may not be equally important for classification task. In this paper, we decompose the t-statistic into two parts, corresponding to relevant and irrelevant data points. The relevant data points are selected using support vectors and then used to compute t-statistic for feature selection. By simultaneously selecting data points and genes, significantly better classification results are achieved on synthetic as well as on several benchmark cancer datasets.