Searching protein 3-D structures for optimal structure alignment using intelligent algorithms and data structures

Authors:
Tomáš Novosád;Václav Snášel;Ajith Abraham;Jack Y. Yang
Affiliations:
Department of Computer Science, Vysoká Skola Báňská, Technical University of Ostrava, Ostrava, Czech Republic;Department of Computer Science, Vysoká Skola Báňská, Technical University of Ostrava, Ostrava, Czech Republic;Machine Intelligence Research Labs, Auburn, WA;Harvard University, Cambridge, MA
Venue:
IEEE Transactions on Information Technology in Biomedicine
Year:
2010

Citing 15
Cited 1

A new distance metric on strings computable in linear time

Discrete Applied Mathematics
Term-weighting approaches in automatic text retrieval

Information Processing and Management: an International Journal
Algorithms on strings, trees, and sequences: computer science and computational biology

Algorithms on strings, trees, and sequences: computer science and computational biology
Web document clustering: a feasibility demonstration

Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
A Space-Economical Suffix Tree Construction Algorithm

Journal of the ACM (JACM)
Linear Algorithm for Data Compression via String Matching

Journal of the ACM (JACM)
Information Retrieval

Information Retrieval
Modern Information Retrieval

Modern Information Retrieval
Clustering web documents: a phrase-based method for grouping search engine results

Clustering web documents: a phrase-based method for grouping search engine results
Efficient Phrase-Based Document Indexing for Web Document Clustering

IEEE Transactions on Knowledge and Data Engineering
PSIST: Indexing Protein Structures Using Suffix Trees

CSB '05 Proceedings of the 2005 IEEE Computational Systems Bioinformatics Conference
A new suffix tree similarity measure for document clustering

Proceedings of the 16th international conference on World Wide Web
Introduction to Information Retrieval

Introduction to Information Retrieval
Linear pattern matching algorithms

SWAT '73 Proceedings of the 14th Annual Symposium on Switching and Automata Theory (swat 1973)
Geometric suffix tree: a new index structure for protein 3-d structures

CPM'06 Proceedings of the 17th Annual conference on Combinatorial Pattern Matching

Efficiently querying protein sequences with the proteinus index

BSB'11 Proceedings of the 6th Brazilian conference on Advances in bioinformatics and computational biology

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper, we present a novel algorithm for measuring protein similarity based on their 3-D structure (protein tertiary structure). The algorithm used a suffix tree for discovering common parts of main chains of all proteins appearing in the current research collaboratory for structural bioinformatics protein data bank (PDB). By identifying these common parts, we build a vector model and use some classical information retrieval (IR) algorithms based on the vector model to measure the similarity between proteins--all to all protein similarity. For the calculation of protein similarity, we use term frequency × inverse document frequency (tf × idf) term weighing schema and cosine similarity measure. The goal of this paper is to introduce new protein similarity metric based on suffix trees and IR methods.Whole current PDB database was used to demonstrate very good time complexity of the algorithm as well as high precision.We have chosen the structural classification of proteins (SCOP) database for verification of the precision of our algorithm because it is maintained primarily by humans. The next success of this paper would be the ability to determine SCOP categories of proteins not included in the latest version of the SCOP database (v. 1.75) with nearly 100% precision.