Bayesian haplo-type inference via the dirichlet process

Authors:
Eric Xing;Roded Sharan;Michael I. Jordan
Affiliations:
University of California, Berkeley, CA;University of California, Berkeley, CA;University of California, Berkeley, CA
Venue:
ICML '04 Proceedings of the twenty-first international conference on Machine learning
Year:
2004

Citing 3
Cited 5

Haplotyping as perfect phylogeny: conceptual framework and efficient solutions

Proceedings of the sixth annual international conference on Computational biology
Model-based inference of haplotype block variation

RECOMB '03 Proceedings of the seventh annual international conference on Research in computational molecular biology
Large Scale Recovery of Haplotypes from Genotype Data using Imperfect

Large Scale Recovery of Haplotypes from Genotype Data using Imperfect

Bayesian multi-population haplotype inference via a hierarchical dirichlet process mixture

ICML '06 Proceedings of the 23rd international conference on Machine learning
A Bayesian Model for Supervised Clustering with the Dirichlet Process Prior

The Journal of Machine Learning Research
A permutation-augmented sampler for DP mixture models

Proceedings of the 24th international conference on Machine learning
Multi-marker tagging single nucleotide polymorphism selection using estimation of distribution algorithms

Artificial Intelligence in Medicine
Nonparametric combinatorial sequence models

RECOMB'11 Proceedings of the 15th Annual international conference on Research in computational molecular biology

Quantified Score

Hi-index	0.00

Visualization

Abstract

The problem of inferring haplotypes from genotypes of single nucleotide polymorphisms (SNPs) is essential for the understanding of genetic variation within and among populations, with important applications to the genetic analysis of disease propensities and other complex traits. The problem can be formulated as a mixture model, where the mixture components correspond to the pool of haplotypes in the population. The size of this pool is unknown; indeed, knowing the size of the pool would correspond to knowing something significant about the genome and its history. Thus methods for fitting the genotype mixture must crucially address the problem of estimating a mixture with an unknown number of mixture components. In this paper we present a Bayesian approach to this problem based on a nonparametric prior known as the Dirichlet process. The model also incorporates a likelihood that captures statistical errors in the haplotype/genotype relationship. We apply our approach to the analysis of both simulated and real genotype data, and compare to extant methods.