Refining Neural Network Predictions for Helical Transmembrane Proteins by Dynamic Programming
Proceedings of the Fourth International Conference on Intelligent Systems for Molecular Biology
A Hidden Markov Model for Predicting Transmembrane Helices in Protein Sequences
ISMB '98 Proceedings of the 6th International Conference on Intelligent Systems for Molecular Biology
IBM Systems Journal - Deep computing for the life sciences
Enhancing border security: Mutual information analysis to identify suspect vehicles
Decision Support Systems
Collaborative discovery through biological language modeling interface
Ambient Intelligence in Everyday Life
Hi-index | 0.00 |
Protein sequence data is abundant, yet derivation of structural features from sequence alone is generally restricted to prediction of domain architecture, secondary structure elements and motifs. Precise feature boundaries cannot be determined reliably, and it is unknown to what extent these features constitute fundamental building blocks of protein sequences, a question with particular relevance to protein folding. Here we propose a statistical approach using mutual information, a measure of association, to predict feature boundaries. In this approach, proteins are viewed as strings of adjacent, non-overlapping features, where each feature is a subsequence of the protein, and the union of the features is the entire protein. Mutual information values are measured between nearby amino acids along sequences, and low values are indicators for feature boundaries. These boundaries are then predicted using a flexible partitioning algorithm. The algorithms presented in this paper were tested on the GPCR protein family and subfamilies. A comparison with segment boundaries implied indirectly from secondary structure prediction and expert knowledge demonstrates that the algorithm can be used to statistically predict feature positions in protein sequences generically, without assumptions on the feature type to be detected. Access to the data used and algorithms presented in this paper are available at flan.blm.cs.cmu.edu.