Structure-based querying of proteins using wavelets

Authors:
Keith Marsolo;Srinivasan Parthasarathy;Kotagiri Ramamohanarao
Affiliations:
The Ohio State University;The Ohio State University;University of Melbourne
Venue:
CIKM '06 Proceedings of the 15th ACM international conference on Information and knowledge management
Year:
2006

Citing 9
Cited 2

Spacefilling curves and the planar travelling salesman problem

Journal of the ACM (JACM)
Instance-Based Learning Algorithms

Machine Learning
Multidimensional binary search trees used for associative searching

Communications of the ACM
Towards Index-based Similarity Search for Protein Structure Databases

CSB '03 Proceedings of the IEEE Computer Society Conference on Bioinformatics
PSIST: Indexing Protein Structures Using Suffix Trees

CSB '05 Proceedings of the 2005 IEEE Computational Systems Bioinformatics Conference
Rapid 3D protein structure database searching using information retrieval techniques

Bioinformatics
Alternate Representation of Distance Matrices for Characterization of Protein Structure

ICDM '05 Proceedings of the Fifth IEEE International Conference on Data Mining
PiQA: an algebra for querying protein data sets

SSDBM '03 Proceedings of the 15th International Conference on Scientific and Statistical Database Management
A Wavelet Tour of Signal Processing, Third Edition: The Sparse Way

A Wavelet Tour of Signal Processing, Third Edition: The Sparse Way

Incorporating several features in the protein ray descriptor for more accurate protein 3D structure retrieval

Proceedings of the ACM workshop on 3D object retrieval
Efficient Approaches for Retrieving Protein Tertiary Structures

IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)

Quantified Score

Hi-index	0.00

Visualization

Abstract

The ability to retrieve molecules based on structural similarity has use in many applications, from disease diagnosis and treatment to drug discovery and design. In this paper, we present a method to represent protein molecules that allows for the fast, flexible and efficient retrieval of similar structures, based on either global or local attributes. We begin by computing the pair-wise distance between amino acids, transforming each 3D structure into a 2D distance matrix. We normalize this matrix to a specific size and apply a 2D wavelet decomposition to generate a set of approximation coefficients, which serves as our global feature vector. This transformation reduces the overall dimensionality of the data while still preserving spatial features and correlations. We test our method by running queries on three different protein data sets that have been used previously in the literature, basing our comparisons on labels taken from the SCOP database. We find that our method significantly outperforms existing approaches, in terms of retrieval accuracy, memory utilization and execution time. Specifically, using a k-d tree and running a 10-nearest-neighbor search on a dataset of 33,000 proteins against itself, we see an average accuracy of 89% at the SCOP SuperFamily level and a total query time that is up to 350 times faster than previously published techniques. In addition to processing queries based on global similarity, we also propose innovative extensions to effectively match proteins based solely on shared local substructures, allowing for a more flexible query interface.