Alternate Representation of Distance Matrices for Characterization of Protein Structure

Authors:
Keith Marsolo;Srinivasan Parthasarathy
Affiliations:
Ohio State University;Ohio State University
Venue:
ICDM '05 Proceedings of the Fifth IEEE International Conference on Data Mining
Year:
2005

Citing 6
Cited 3

Ten lectures on wavelets

Ten lectures on wavelets
C4.5: programs for machine learning

C4.5: programs for machine learning
Approximation of protein structure for fast similarity measures

RECOMB '03 Proceedings of the seventh annual international conference on Research in computational molecular biology
Using Iterative Dynamic Programming to Obtain Accurate Pairwise and Multiple Alignments of Protein Structures

Proceedings of the Fourth International Conference on Intelligent Systems for Molecular Biology
Automatic Protein Structure Classification through Structural Fingerprinting

BIBE '04 Proceedings of the 4th IEEE Symposium on Bioinformatics and Bioengineering
A Multi-Level Approach to SCOP Fold Recognition

BIBE '05 Proceedings of the Fifth IEEE Symposium on Bioinformatics and Bioengineering

Structure-based querying of proteins using wavelets

CIKM '06 Proceedings of the 15th ACM international conference on Information and knowledge management
Fast SCOP Classification of Structural Class and Fold Using Secondary Structure Mining in Distance Matrix

PRIB '09 Proceedings of the 4th IAPR International Conference on Pattern Recognition in Bioinformatics
Efficient Approaches for Retrieving Protein Tertiary Structures

IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)

Quantified Score

Hi-index	0.00

Visualization

Abstract

The most suitable method for the automated classification of protein structures remains an open problem in computational biology. In order to classify a protein structure with any accuracy, an effective representation must be chosen. Here we present two methods of representing protein structure. One involves representing the distances between the Cá atoms of a protein as a two-dimensional matrix and creating a model of the resulting surface with Zernike polynomials. The second uses a wavelet-based approach. We convert the distances between a protein's Cα atoms into a one-dimensional signal which is then decomposed using a discrete wavelet transformation. Using the Zernike co-efficients and the approximation coefficients of the wavelet decomposition as feature vectors, we test the effectiveness of our representation with two different classifiers on a dataset of more than 600 proteins taken from the 27 most-populated SCOP folds. We find that the wavelet decomposition greatly outperforms the Zernike model.With the wavelet representation, we achieve an accuracy of approximately 56%, roughly 12% higher than results reported on a similar, but less-challenging dataset. In addition, we can couple our structure-based feature vectors with several sequence-based properties to increase accuracy another 5-7%. Finally, we use a multi-stage classification strategy on the combined features to increase performance to 78%, an improvement in accuracy of more than 15-20% and 34% over the highest reported sequence-based and structure-based classification results, respectively.