The power of amnesia: learning probabilistic automata with variable memory length
Machine Learning - Special issue on COLT '94
A unified approach to word occurrence probabilities
Discrete Applied Mathematics - Special volume on combinatorial molecular biology
Modeling dependencies in protein-DNA binding sites
RECOMB '03 Proceedings of the seventh annual international conference on Research in computational molecular biology
Monte Carlo Statistical Methods (Springer Texts in Statistics)
Monte Carlo Statistical Methods (Springer Texts in Statistics)
Bayesian model learning based on a parallel MCMC strategy
Statistics and Computing
Computing exact P-values for DNA motifs
Bioinformatics
Efficient exact motif discovery
Bioinformatics
Hi-index | 0.00 |
Identification of regulatory binding motifs, that is, short specific words, within DNA sequences is a commonly occurring problem in computational bioinformatics. A wide variety of probabilistic approaches have been proposed in the literature to either scan for previously known motif types or to attempt de novo identification of a fixed number (typically one) of putative motifs. Most approaches assume the existence of reliable biodatabase information to build probabilistic a priori description of the motif classes. Examples of attempts to do probabilistic unsupervised learning about the number of putative de novo motif types and their positions within a set of DNA sequences are very rare in the literature. Here we show how such a learning problem can be formulated using a Bayesian model that targets to simultaneously maximize the marginal likelihood of sequence data arising under multiple motif types as well as under the background DNA model, which equals a variable length Markov chain. It is demonstrated how the adopted Bayesian modelling strategy combined with recently introduced nonstandard stochastic computation tools yields a more tractable learning procedure than is possible with the standard Monte Carlo approaches. Improvements and extensions of the proposed approach are also discussed.