The power of amnesia: learning probabilistic automata with variable memory length
Machine Learning - Special issue on COLT '94
Finding short DNA motifs using permuted markov models
RECOMB '04 Proceedings of the eighth annual international conference on Resaerch in computational molecular biology
Hi-index | 0.00 |
Recently Peres and Shields discovered a new method for estimating the order of a stationary fixed order Markov chain [15]. They showed that the estimator is consistent by proving a threshold result. While this threshold is valid asymptotically in the limit, it is not very useful for DNA sequence analysis where data sizes are moderate. In this paper we give a novel interpretation of the Peres-Shields estimator as a sharp transition phenomenon. This yields a precise and powerful estimator that quickly identifies the core dependencies in data. We show that it compares favorably to other estimators, especially in the presence of noise and/or variable dependencies. Motivated by this last point, we extend the Peres-Shields estimator to Variable Length Markov Chains. We give an application to the problem of detecting DNA sequence similarity using genomic signatures. Abbreviations: Mk = Fixed order Markov model of order k, PST = Prediction suffix tree, MC = Markov chain, VLMC = Variable length Markov chain.