Nonparametric combinatorial sequence models

  • Authors:
  • Fabian L. Wauthier;Michael I. Jordan;Nebojsa Jojic

  • Affiliations:
  • University of California, Berkeley;University of California, Berkeley;Microsoft Research, Redmond

  • Venue:
  • RECOMB'11 Proceedings of the 15th Annual international conference on Research in computational molecular biology
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

This work considers biological sequences that exhibit combinatorial structures in their composition: groups of positions of the aligned sequences are "linked" and covary as one unit across sequences. If multiple such groups exist, complex interactions can emerge between them. Sequences of this kind arise frequently in biology but methodologies for analyzing them are still being developed. This paper presents a nonparametric prior on sequences which allows combinatorial structures to emerge and which induces a posterior distribution over factorized sequence representations. We carry out experiments on three sequence datasets which indicate that combinatorial structures are indeed present and that combinatorial sequence models can more succinctly describe them than simpler mixture models. We conclude with an application to MHC binding prediction which highlights the utility of the posterior distribution induced by the prior. By integrating out the posterior our method compares favorably to leading binding predictors.