Semi-supervised clustering algorithm for haplotype assembly problem based on MEC model

Authors:
Xin-Shun Xu;Ying-Xin Li
Affiliations:
School of Computer Science and Technology, Shandong University, Jinan 250101, China/ The National Key Laboratory for Novel Software Technology, Nanjing University, Nanjing 210093, China;Institute of Machine Vision and Machine Intelligence, Beijing Jingwei Textile Machinery New Technology Co., Ltd., No. 8 Yongchang Zhong Road, BDA, Beijing 100176, China
Venue:
International Journal of Data Mining and Bioinformatics
Year:
2012

Citing 20
Cited 0

Haplotyping as perfect phylogeny: conceptual framework and efficient solutions

Proceedings of the sixth annual international conference on Computational biology
Constrained K-means Clustering with Background Knowledge

ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
From Instance-level Constraints to Space-Level Constraints: Making the Most of Prior Knowledge in Data Clustering

ICML '02 Proceedings of the Nineteenth International Conference on Machine Learning
Semi-supervised Clustering by Seeding

ICML '02 Proceedings of the Nineteenth International Conference on Machine Learning
A Dataset Generator for Whole Genome Shotgun Sequencing

Proceedings of the Seventh International Conference on Intelligent Systems for Molecular Biology
SNPs Problems, Complexity, and Algorithms

ESA '01 Proceedings of the 9th Annual European Symposium on Algorithms
Integrating constraints and metric learning in semi-supervised clustering

ICML '04 Proceedings of the twenty-first international conference on Machine learning
Clustering documents into a web directory for bootstrapping a supervised classification

Data & Knowledge Engineering - Special issue: WIDM 2003
Haplotype reconstruction from genotype data using Imperfect Phylogeny

Bioinformatics
Haplotype reconstruction from SNP fragments by minimum error correction

Bioinformatics
Technical comment: A clustering algorithm based on two distance functions for MEC model

Computational Biology and Chemistry
Algorithm note: The haplotype assembly model with genotype information and iterative local-exhaustive search algorithm

Computational Biology and Chemistry
Message Passing Clustering (MPC): a knowledge-based framework for clustering under biological constraints

International Journal of Data Mining and Bioinformatics
A model of higher accuracy for the individual haplotyping problem based on weighted SNP fragments and genotype with errors

Bioinformatics
An Improved (and Practical) Parameterized Algorithm for the Individual Haplotyping Problem MFR with Mate-Pairs

Algorithmica - Parameterized and Exact Algorithms
HapCUT

Bioinformatics
A semi-supervised approach to projected clustering with applications to microarray data

International Journal of Data Mining and Bioinformatics
Clustering sequences by overlap

International Journal of Data Mining and Bioinformatics
Haplotype assembly from aligned weighted SNP fragments

Computational Biology and Chemistry
Two phase semi-supervised clustering using background knowledge

IDEAL'06 Proceedings of the 7th international conference on Intelligent Data Engineering and Automated Learning

Quantified Score

Hi-index	0.00

Visualization

Abstract

Haplotype assembly is to infer a pair of haplotypes from localized polymorphism data. In this paper, a semi-supervised clustering algorithmSSK (Semi-Supervised K-means) is proposed for it, which, to our knowledge, is the first semi-supervised clustering method for it. In SSK, some positive information is firstly extracted. The information is then used to help k-means to cluster all SNP fragments into two sets from which two haplotypes can be reconstructed. The performance of SSK is tested on both real data and simulated data. The results show that it outperforms several state-of-the-art algorithms on Minimum Error Correction (MEC) model.