Local prediction approach for protein classification using probabilistic suffix trees

Authors:
Zhaohui Sun;Jitender S. Deogun
Affiliations:
University of Nebraska - Lincoln, Lincoln, NE;University of Nebraska - Lincoln, Lincoln, NE
Venue:
APBC '04 Proceedings of the second conference on Asia-Pacific bioinformatics - Volume 29
Year:
2004

Citing 3
Cited 0

Inference and minimization of hidden Markov chains

COLT '94 Proceedings of the seventh annual conference on Computational learning theory
The power of amnesia: learning probabilistic automata with variable memory length

Machine Learning - Special issue on COLT '94
SAM: SEQUENCE ALIGNMENT AND MODELING SOFTWARE SYSTEM

SAM: SEQUENCE ALIGNMENT AND MODELING SOFTWARE SYSTEM

Quantified Score

Hi-index	0.00

Visualization

Abstract

Probabilistic suffix tree (PST) is a stochastic model that uses a suffix tree as an index structure to store conditional probabilities associated with subsequences. PST has been successfully used to model and predict protein families following global approach. Their approach takes into account the entire sequence, and thus is not suitable for partially conserved families. We develop two variants of PST for local prediction: multiple-domain prediction and best-domain prediction. The multiple-domain method predicts the probability that a protein belongs to a family based on one or more significant conserved regions, while the best-domain method does it based on the most conserved region in the query sequence. The time complexity of both of our approaches is the same as that of the global prediction, that is, O(Lm) where L is the depth bound of the tree and m is the size of the query sequence. We tested our algorithms on the Pfam database of protein families and compared the results with the global prediction method. The experimental results show that our approaches have higher accuracy of prediction than that of global approach. We also show that, our local prediction approach is an effective way to extract motifs/domains. Our approaches employ a linear time method for building PST by adapting the linear time construction of Probabilistic Automata reported by A. Apostolico et al.