Unsupervised Feature Selection Using Feature Similarity
IEEE Transactions on Pattern Analysis and Machine Intelligence
Haplotypes and informative SNP selection algorithms: don't block out information
RECOMB '03 Proceedings of the seventh annual international conference on Research in computational molecular biology
Asymptotic behaviors of support vector machines with Gaussian kernel
Neural Computation
An introduction to variable and feature selection
The Journal of Machine Learning Research
Data Mining with Computational Intelligence (Advanced Information and Knowledge Processing)
Data Mining with Computational Intelligence (Advanced Information and Knowledge Processing)
Support Vector Machines: Theory and Applications (Studies in Fuzziness and Soft Computing)
Support Vector Machines: Theory and Applications (Studies in Fuzziness and Soft Computing)
Choosing SNPs Using Feature Selection
CSB '05 Proceedings of the 2005 IEEE Computational Systems Bioinformatics Conference
Accurate Cancer Classification Using Expressions of Very Few Genes
IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Hi-index | 0.00 |
The single nucleotide polymorphisms (SNPs) are believed to determine human differences and, to some degree, provide biomedical researchers a possibility of predicting risks of some diseases and explaining patients' different responses to drug regimens. With the availability of millions of SNPs in the Hapmap Project, although large amount of information about SNPs is available, the tremendous size also causes a major challenge for research on SNPs. Inspired from the recent research work on population classification by Park et al (2006), we attempt to find as few SNPs as possible from the original nearly 4 millions SNPs to classify the 3 populations in the Hapmap genotype data. In this paper, we propose to first use a modified t-test measure to rank SNPs, and then combine the ranking result with a classifier, e.g., the support vector machine, to find the optimal SNP subset. Compared with Park et al's result, our proposed method is more efficient in ranking features and classifying the three populations, i.e., we obtained perfect classification using only 11 SNPs in comparison with 82 SNPs used by Park et al.