Making large-scale support vector machine learning practical
Advances in kernel methods
Class prediction and discovery using gene expression data
RECOMB '00 Proceedings of the fourth annual international conference on Computational molecular biology
Gene functional classification from heterogeneous data
RECOMB '01 Proceedings of the fifth annual international conference on Computational biology
Analysis of gene expression profiles: class discovery and leaf ordering
Proceedings of the sixth annual international conference on Computational biology
RECOMB '03 Proceedings of the seventh annual international conference on Research in computational molecular biology
CSB '03 Proceedings of the IEEE Computer Society Conference on Bioinformatics
Pattern Classification (2nd Edition)
Pattern Classification (2nd Edition)
Mining coherent gene clusters from gene-sample-time microarray data
Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
ECML PKDD '08 Proceedings of the 2008 European Conference on Machine Learning and Knowledge Discovery in Databases - Part I
Neighborhood Rough Set Model Based Gene Selection for Multi-subtype Tumor Classification
ICIC '08 Proceedings of the 4th international conference on Intelligent Computing: Advanced Intelligent Computing Theories and Applications - with Aspects of Theoretical and Methodological Issues
A Novel Hybrid Method of Gene Selection and Its Application on Tumor Classification
ICIC '08 Proceedings of the 4th international conference on Intelligent Computing: Advanced Intelligent Computing Theories and Applications - with Aspects of Artificial Intelligence
Extraction of informative genes from integrated microarray data
ISMIS'08 Proceedings of the 17th international conference on Foundations of intelligent systems
Artificial Intelligence in Medicine
Feature selection for support vector machines with RBF kernel
Artificial Intelligence Review
A Survey on Filter Techniques for Feature Selection in Gene Expression Microarray Analysis
IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
ICIC'13 Proceedings of the 9th international conference on Intelligent Computing Theories and Technology
Hi-index | 0.00 |
Finding informative genes from microarray data is an important research problem in bioinformatics research and applications. Most of the existing methods rank features according to their discriminative capability and then find a subset of discriminative genes (usually top k genes). In particular, t-statistic criterion and its variants have been adopted extensively. This kind of methods rely on the statistics principle of t-test, which requires that the data follows a normal distribution. However, according to our investigation, the normality condition often cannot be met in real data sets.To avoid the assumption of the normality condition, in this paper, we propose a rank sum test method for informative gene discovery. The method uses a rank-sum statistic as the ranking criterion. Moreover, we propose using the significance level threshold, instead of the number of informative genes, as the parameter. The significance level threshold as a parameter carries the quality specification in statistics. We follow the Pitman efficiency theory to show that the rank sum method is more accurate and more robust than the t-statistic method in theory.To verify the effectiveness of the rank sum method, we use support vector machine (SVM) to construct classifiers based on the identified informative genes on two well known data sets, namely colon data and leukemia data. The prediction accuracy reaches 96.2% on the colon data and 100% on the leukemia data. The results are clearly better than those from the previous feature ranking methods. By experiments, we also verify that using significance level threshold is more effective than directly specifying an arbitrary k.