On string classification in data streams
Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining
Probabilistic prediction of protein secondary structure using causal networks
AAAI'93 Proceedings of the eleventh national conference on Artificial intelligence
Hi-index | 0.00 |
We apply Hidden Markov Models (HMMs) to the problem of statistical modeling and multiple sequence alignment of protein families. A variant of the Expectation Maximization (EM) algorithm known as the Viterbi algorithm is used to obtain the statistical model from the unaligned sequences. In a detailed series of experiments, we have taken 400 unaligned globin sequences, and produced a statistical model entirely automatically from the primary (unaligned) sequences. We use no prior knowledge of globin structure. Using this model, we obtained a multiple alignment of the 400 sequences and 225 other globin sequences that agrees almost perfectly with a structural alignment by Bashford et al. This model can also discriminate all these 625 globins from nonglobin protein sequences with greater than 99% accuracy, and can thus be used for database searches.