Perfect Population Classification on Hapmap Data with a Small Number of SNPs

  • Authors:
  • Nina Zhou;Lipo Wang

  • Affiliations:
  • College of Information Engineering, Xiangtan University, Xiangtan, China;Nanyang Technological University, Singapore 639798

  • Venue:
  • Neural Information Processing
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

The single nucleotide polymorphisms (SNPs) are believed to determine human differences and, to some degree, provide biomedical researchers a possibility of predicting risks of some diseases and explaining patients' different responses to drug regimens. With the availability of millions of SNPs in the Hapmap Project, although large amount of information about SNPs is available, the tremendous size also causes a major challenge for research on SNPs. Inspired from the recent research work on population classification by Park et al (2006), we attempt to find as few SNPs as possible from the original nearly 4 millions SNPs to classify the 3 populations in the Hapmap genotype data. In this paper, we propose to first use a modified t-test measure to rank SNPs, and then combine the ranking result with a classifier, e.g., the support vector machine, to find the optimal SNP subset. Compared with Park et al's result, our proposed method is more efficient in ranking features and classifying the three populations, i.e., we obtained perfect classification using only 11 SNPs in comparison with 82 SNPs used by Park et al.