Fast Bayesian Haplotype Inference Via Context Tree Weighting

Authors:
Pasi Rastas;Jussi Kollin;Mikko Koivisto
Affiliations:
Department of Computer Science & HIIT Basic Research Unit, University of Helsinki, Finland FIN-00014;Department of Computer Science & HIIT Basic Research Unit, University of Helsinki, Finland FIN-00014;Department of Computer Science & HIIT Basic Research Unit, University of Helsinki, Finland FIN-00014
Venue:
WABI '08 Proceedings of the 8th international workshop on Algorithms in Bioinformatics
Year:
2008

Citing 7
Cited 0

Annealed importance sampling

Statistics and Computing
Protein Family Classification Using Sparse Markov Transducers

Proceedings of the Eighth International Conference on Intelligent Systems for Molecular Biology
Haplotype reconstruction from genotype data using Imperfect Phylogeny

Bioinformatics
Identification of deletion polymorphisms from haplotypes

RECOMB'07 Proceedings of the 11th annual international conference on Research in computational molecular biology
A hidden markov technique for haplotype reconstruction

WABI'05 Proceedings of the 5th International conference on Algorithms in Bioinformatics
The context-tree weighting method: extensions

IEEE Transactions on Information Theory
The context-tree weighting method: basic properties

IEEE Transactions on Information Theory

Quantified Score

Hi-index	0.00

Visualization

Abstract

We present a new, Bayesian method for inferring haplotypes for unphased genotypes. The method can be viewed as a unification of some ideas of variable-order Markov chain modelling and ensemble learning that so far have been implemented only separately in some of the state-of-the-art methods. Specifically, we make use of the Context Tree Weighting algorithm to efficiently compute the posterior probability of any given haplotype assignment; we employ a simulated annealing scheme to rapidly find several local optima of the posterior; and we sketch a full Bayesian analogue, in which a weighted sample of haplotype assignments is drawn to summarize the posterior distribution. We also show that one can minimize in linear time the average switch distance, a popular measure of phasing accuracy, to a given (weighted) sample of haplotype assignments. We demonstrate empirically that the presented method typically performs as well as the leading fast haplotype inference methods, and sometimes better. The methods are freely available in a computer program BACH (Bayesian Context-based Haplotyping)