Inference and minimization of hidden Markov chains
COLT '94 Proceedings of the seventh annual conference on Computational learning theory
The power of amnesia: learning probabilistic automata with variable memory length
Machine Learning - Special issue on COLT '94
SAM: SEQUENCE ALIGNMENT AND MODELING SOFTWARE SYSTEM
SAM: SEQUENCE ALIGNMENT AND MODELING SOFTWARE SYSTEM
Hi-index | 0.00 |
Probabilistic suffix tree (PST) is a stochastic model that uses a suffix tree as an index structure to store conditional probabilities associated with subsequences. PST has been successfully used to model and predict protein families following global approach. Their approach takes into account the entire sequence, and thus is not suitable for partially conserved families. We develop two variants of PST for local prediction: multiple-domain prediction and best-domain prediction. The multiple-domain method predicts the probability that a protein belongs to a family based on one or more significant conserved regions, while the best-domain method does it based on the most conserved region in the query sequence. The time complexity of both of our approaches is the same as that of the global prediction, that is, O(Lm) where L is the depth bound of the tree and m is the size of the query sequence. We tested our algorithms on the Pfam database of protein families and compared the results with the global prediction method. The experimental results show that our approaches have higher accuracy of prediction than that of global approach. We also show that, our local prediction approach is an effective way to extract motifs/domains. Our approaches employ a linear time method for building PST by adapting the linear time construction of Probabilistic Automata reported by A. Apostolico et al.