WHAT IF: A molecular modeling and drug design program
Journal of Molecular Graphics
Systematic and automated discovery of patterns in PROSITE families
RECOMB '00 Proceedings of the fourth annual international conference on Computational molecular biology
Computationally Efficient Cluster Representation in Molecular Sequence Megaclassification
Proceedings of the 1st International Conference on Intelligent Systems for Molecular Biology
Using Dirichlet Mixture Priors to Derive Hidden Markov Models for Protein Families
Proceedings of the 1st International Conference on Intelligent Systems for Molecular Biology
A Map of the Protein Space: An Automatic Hierarchical Classification of all Protein Sequences
ISMB '98 Proceedings of the 6th International Conference on Intelligent Systems for Molecular Biology
Proceedings of the 2004 ACM symposium on Applied computing
Subsequence-based feature map for protein function classification
Computational Biology and Chemistry
Hi-index | 0.00 |
Given a functionally heterogeneous set of proteins, such as a large superfamily or an entire database, two important problems in biology are the automated inference of subsets of functionally related proteins and the identification of functional regions and residues. The former is typically performed in an unsupervised bottom-up manner, by clustering based on pair-wise sequence similarity. The latter is performed independently, in a supervised top-down manner starting from functional sets that have already been identified by either biological or computational means. Clearly, however, the two processes remain inextricably linked, because functional motifs and residues are related to corresponding functional clusters. This paper introduces a high-performance, top-down clustering technique and the corresponding system that determines functionally related clusters and functional motifs by coupling a pattern discovery algorithm, a statistical framework for the analysis of discovered patterns, and a motif refinement method based on hidden Markov models. Results are reported for the G protein-coupled receptor superfamily. These show that a significant majority of well-known functional sets and biologically relevant motifs are correctly recovered. They also show that a majority of the important functional residues reported in the literature occur in the inferred functional motifs. This technique has relevant implication in functional clustering and could be used as a highly predictive aid to mutagenesis experiments.