Model, properties and imputation method of missing SNP genotype data utilizing mutual information

  • Authors:
  • Ying Wang;Weiming Wan;Rui-Sheng Wang;Enmin Feng

  • Affiliations:
  • School of Science, Dalian Jiaotong University, Dalian 116028, China;School of Science, Dalian Jiaotong University, Dalian 116028, China;School of Information, Renmin University of China, Beijing 100872, China;Department of Applied Mathematics, Dalian University of Technology, Dalian 116024, China

  • Venue:
  • Journal of Computational and Applied Mathematics
  • Year:
  • 2009

Quantified Score

Hi-index 7.29

Visualization

Abstract

Mutual information can be used as a measure for the association of a genetic marker or a combination of markers with the phenotype. In this paper, we study the imputation of missing genotype data. We first utilize joint mutual information to compute the dependence between SNP sites, then construct a mathematical model in order to find the two SNP sites having maximal dependence with missing SNP sites, and further study the properties of this model. Finally, an extension method to haplotype-based imputation is proposed to impute the missing values in genotype data. To verify our method, extensive experiments have been performed, and numerical results show that our method is superior to haplotype-based imputation methods. At the same time, numerical results also prove joint mutual information can better measure the dependence between SNP sites. According to experimental results, we also conclude that the dependence between the adjacent SNP sites is not necessarily strongest.