Stochastic Complexity in Statistical Inquiry Theory
Stochastic Complexity in Statistical Inquiry Theory
Bayesian haplo-type inference via the dirichlet process
ICML '04 Proceedings of the twenty-first international conference on Machine learning
Joint discovery of haplotype blocks and complex trait associations from SNP sequences
UAI '04 Proceedings of the 20th conference on Uncertainty in artificial intelligence
Bayesian multi-population haplotype inference via a hierarchical dirichlet process mixture
ICML '06 Proceedings of the 23rd international conference on Machine learning
Learning MHC I—peptide binding
Bioinformatics
Hi-index | 0.00 |
This work considers biological sequences that exhibit combinatorial structures in their composition: groups of positions of the aligned sequences are "linked" and covary as one unit across sequences. If multiple such groups exist, complex interactions can emerge between them. Sequences of this kind arise frequently in biology but methodologies for analyzing them are still being developed. This paper presents a nonparametric prior on sequences which allows combinatorial structures to emerge and which induces a posterior distribution over factorized sequence representations. We carry out experiments on three sequence datasets which indicate that combinatorial structures are indeed present and that combinatorial sequence models can more succinctly describe them than simpler mixture models. We conclude with an application to MHC binding prediction which highlights the utility of the posterior distribution induced by the prior. By integrating out the posterior our method compares favorably to leading binding predictors.