Perfect Population Classification on Hapmap Data with a Small Number of SNPs

Authors:
Nina Zhou;Lipo Wang
Affiliations:
College of Information Engineering, Xiangtan University, Xiangtan, China;Nanyang Technological University, Singapore 639798
Venue:
Neural Information Processing
Year:
2008

Citing 9
Cited 0

Unsupervised Feature Selection Using Feature Similarity

IEEE Transactions on Pattern Analysis and Machine Intelligence
Haplotypes and informative SNP selection algorithms: don't block out information

RECOMB '03 Proceedings of the seventh annual international conference on Research in computational molecular biology
Asymptotic behaviors of support vector machines with Gaussian kernel

Neural Computation
An introduction to variable and feature selection

The Journal of Machine Learning Research
Data Mining with Computational Intelligence (Advanced Information and Knowledge Processing)

Data Mining with Computational Intelligence (Advanced Information and Knowledge Processing)
Support Vector Machines: Theory and Applications (Studies in Fuzziness and Soft Computing)

Support Vector Machines: Theory and Applications (Studies in Fuzziness and Soft Computing)
Choosing SNPs Using Feature Selection

CSB '05 Proceedings of the 2005 IEEE Computational Systems Bioinformatics Conference
Tag SNP selection in genotype data for maximizing SNP prediction accuracy

Bioinformatics
Accurate Cancer Classification Using Expressions of Very Few Genes

IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)

Quantified Score

Hi-index	0.00

Visualization

Abstract

The single nucleotide polymorphisms (SNPs) are believed to determine human differences and, to some degree, provide biomedical researchers a possibility of predicting risks of some diseases and explaining patients' different responses to drug regimens. With the availability of millions of SNPs in the Hapmap Project, although large amount of information about SNPs is available, the tremendous size also causes a major challenge for research on SNPs. Inspired from the recent research work on population classification by Park et al (2006), we attempt to find as few SNPs as possible from the original nearly 4 millions SNPs to classify the 3 populations in the Hapmap genotype data. In this paper, we propose to first use a modified t-test measure to rank SNPs, and then combine the ranking result with a classifier, e.g., the support vector machine, to find the optimal SNP subset. Compared with Park et al's result, our proposed method is more efficient in ranking features and classifying the three populations, i.e., we obtained perfect classification using only 11 SNPs in comparison with 82 SNPs used by Park et al.