Semi-supervised clustering algorithm for haplotype assembly problem based on MEC model

  • Authors:
  • Xin-Shun Xu;Ying-Xin Li

  • Affiliations:
  • School of Computer Science and Technology, Shandong University, Jinan 250101, China/ The National Key Laboratory for Novel Software Technology, Nanjing University, Nanjing 210093, China;Institute of Machine Vision and Machine Intelligence, Beijing Jingwei Textile Machinery New Technology Co., Ltd., No. 8 Yongchang Zhong Road, BDA, Beijing 100176, China

  • Venue:
  • International Journal of Data Mining and Bioinformatics
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

Haplotype assembly is to infer a pair of haplotypes from localized polymorphism data. In this paper, a semi-supervised clustering algorithm—SSK (Semi-Supervised K-means) is proposed for it, which, to our knowledge, is the first semi-supervised clustering method for it. In SSK, some positive information is firstly extracted. The information is then used to help k-means to cluster all SNP fragments into two sets from which two haplotypes can be reconstructed. The performance of SSK is tested on both real data and simulated data. The results show that it outperforms several state-of-the-art algorithms on Minimum Error Correction (MEC) model.