Ten lectures on wavelets
C4.5: programs for machine learning
C4.5: programs for machine learning
Approximation of protein structure for fast similarity measures
RECOMB '03 Proceedings of the seventh annual international conference on Research in computational molecular biology
Proceedings of the Fourth International Conference on Intelligent Systems for Molecular Biology
Automatic Protein Structure Classification through Structural Fingerprinting
BIBE '04 Proceedings of the 4th IEEE Symposium on Bioinformatics and Bioengineering
A Multi-Level Approach to SCOP Fold Recognition
BIBE '05 Proceedings of the Fifth IEEE Symposium on Bioinformatics and Bioengineering
Structure-based querying of proteins using wavelets
CIKM '06 Proceedings of the 15th ACM international conference on Information and knowledge management
PRIB '09 Proceedings of the 4th IAPR International Conference on Pattern Recognition in Bioinformatics
Efficient Approaches for Retrieving Protein Tertiary Structures
IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Hi-index | 0.00 |
The most suitable method for the automated classification of protein structures remains an open problem in computational biology. In order to classify a protein structure with any accuracy, an effective representation must be chosen. Here we present two methods of representing protein structure. One involves representing the distances between the Cá atoms of a protein as a two-dimensional matrix and creating a model of the resulting surface with Zernike polynomials. The second uses a wavelet-based approach. We convert the distances between a protein's Cα atoms into a one-dimensional signal which is then decomposed using a discrete wavelet transformation. Using the Zernike co-efficients and the approximation coefficients of the wavelet decomposition as feature vectors, we test the effectiveness of our representation with two different classifiers on a dataset of more than 600 proteins taken from the 27 most-populated SCOP folds. We find that the wavelet decomposition greatly outperforms the Zernike model.With the wavelet representation, we achieve an accuracy of approximately 56%, roughly 12% higher than results reported on a similar, but less-challenging dataset. In addition, we can couple our structure-based feature vectors with several sequence-based properties to increase accuracy another 5-7%. Finally, we use a multi-stage classification strategy on the combined features to increase performance to 78%, an improvement in accuracy of more than 15-20% and 34% over the highest reported sequence-based and structure-based classification results, respectively.