Gauss-integral based representation of protein structure for predicting the fold class from the sequence

Authors:
BjøRn G. Nielsen;Peter RøGen;Henrik G. Bohr
Affiliations:
Quantum Protein Centre (QuP), Department of Physics, Technical University of Denmark, Bldg. 309, DK-2800, Kongens Lyngby, Denmark;Department of Mathematics, Technical University of Denmark, Bldg. 303, DK-2800, Kongens Lyngby, Denmark;Quantum Protein Centre (QuP), Department of Physics, Technical University of Denmark, Bldg. 309, DK-2800, Kongens Lyngby, Denmark
Venue:
Mathematical and Computer Modelling: An International Journal
Year:
2006

Citing 7
Cited 0

Multilayer feedforward networks are universal approximators

Neural Networks
Learning internal representations by error propagation

Parallel distributed processing: explorations in the microstructure of cognition, vol. 1
Molecular Modeling of Proteins and Mathematical Prediction of Protein Structure

SIAM Review
Fundamentals of Artificial Neural Networks

Fundamentals of Artificial Neural Networks
A Hybrid Algorithm for Determining Protein Structure

IEEE Expert: Intelligent Systems and Their Applications
Protein Fold Class Prediction: New Methods of Statistical Classification

Proceedings of the Seventh International Conference on Intelligent Systems for Molecular Biology
Role and Results of statistical methods in protein fold class prediction

Mathematical and Computer Modelling: An International Journal

Quantified Score

Hi-index	0.98

Visualization

Abstract

A representative subset of protein chains were selected from the CATH 2.4 database [C.A. Orengo, A.D. Michie, S. Jones, D.T. Jones, M.B. Swindells, J.M. Thornton, CATH-a hierarchic classification of protein domain structures, Structure 5 (8) (1997) 1093-1108], and were used for training a feed-forward neural network in order to predict protein fold classes by using as input the dipeptide frequency matrix and as output a novel representation of the protein chains in R^3^0 space, based on knot invariant values [P. Rogen, B. Fain, Automatic classification of protein structure by using Gauss integrals, Proceedings of the National Academy of Sciences of the United States of America 100 (1) (2003) 119-124; P. Rogen, H.G. Bohr, A new family of global protein shape descriptors, Mathematical Biosciences 182 (2) (2003) 167-181]. In the general case when excluding singletons (proteins representing a topology or a sequence homology as unique members of these sets), the success rates for the predictions were 77% for class level, 60% for architecture, and 48% for topology. The total number of fold classes that are included in the present data set (~500) is ten times that which has been reported in earlier attempts, so this result represents an improvement on previous work (reporting on a few handpicked folds). Furthermore, distance analysis of the network outputs resulting from singletons shows that it is possible to detect novel topologies with very high confidence (~85%), and the network can in these cases be used as a sorting mechanism that identifies sequences which might need special attention. Also, a direct measure of prediction confidence may be obtained from such distance analysis.