Proceedings of the 5th International Conference on Intelligent Systems for Molecular Biology
M-tree: An Efficient Access Method for Similarity Search in Metric Spaces
VLDB '97 Proceedings of the 23rd International Conference on Very Large Data Bases
Proceedings of the Fourth International Conference on Intelligent Systems for Molecular Biology
3D zernike descriptors for content based shape retrieval
SM '03 Proceedings of the eighth ACM symposium on Solid modeling and applications
An Efficient Index-based Protein Structure Database Searching Method
DASFAA '03 Proceedings of the Eighth International Conference on Database Systems for Advanced Applications
Evaluating Top-k Queries over Web-Accessible Databases
ICDE '02 Proceedings of the 18th International Conference on Data Engineering
Bioinformatics
A fast nearest neighbor search algorithm by nonlinear embedding
CVPR '12 Proceedings of the 2012 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
Hi-index | 0.00 |
Functionally annotating protein structures of unknown function is one of the important challenges in Bioinformatics. An informatics approach to predict the function of a protein is by analyzing the functions of other structurally similar proteins. Ability to search and retrieve similar protein structures among large dataset is crucial in this approach. Here, we propose a novel approach for efficient protein structure search where protein structures are represented as vectors by 3D-Zernike Descriptor (3DZD). Surface shape of protein tertiary structure is compactly represented with 3DZD encoding. This simplified representation accelerates the structural search from daylong to matter of seconds. However, further speed up is required to address the scenarios where multiple users access the database at the same time. We address this need for further speed up in protein structural search by exploiting the fast k nearest neighbor algorithms on the 3DZDs. The results show that the proposed methods significantly improve the searching speed. In addition, we introduce an extended approach for protein structure search based on the methods that utilize the 3DZD characteristic. Experiments show that the searching time reduced 75.41% by the fast k-nearest neighbor algorithm, 88.7% by the extended fast k-nearest neighbor algorithm, 88.84% by the fast threshold-based nearest neighbor algorithm, and 91.53% by the fast extended threshold-based nearest neighbor algorithm. In a simulationed test case, the extended threshold-based algorithm which had the highest speed improvement in the initial test case, showed speed improvement up to 87.48% compared to linear scan.