PSIST: A scalable approach to indexing protein structures using suffix trees

Authors:
Feng Gao;Mohammed J. Zaki
Affiliations:
Department of Computer Science, Rensselaer Polytechnic Institute, 110 8th Street, Troy, NY 12180, USA;Department of Computer Science, Rensselaer Polytechnic Institute, 110 8th Street, Troy, NY 12180, USA
Venue:
Journal of Parallel and Distributed Computing
Year:
2008

Citing 13
Cited 0

Algorithms on strings, trees, and sequences: computer science and computational biology

Algorithms on strings, trees, and sequences: computer science and computational biology
A Space-Economical Suffix Tree Construction Algorithm

Journal of the ACM (JACM)
On the sorting-complexity of suffix tree construction

Journal of the ACM (JACM)
Database indexing for large DNA and protein sequence collections

The VLDB Journal — The International Journal on Very Large Data Bases
A Database Index to Large Biological Sequences

Proceedings of the 27th International Conference on Very Large Data Bases
An Efficient Index-based Protein Structure Database Searching Method

DASFAA '03 Proceedings of the Eighth International Conference on Database Systems for Advanced Applications
Towards Index-based Similarity Search for Protein Structure Databases

CSB '03 Proceedings of the IEEE Computer Society Conference on Bioinformatics
CTSS: A Robust and Efficient Method for Protein Structure Alignment Based on Local Geometrical and Biological Features

CSB '03 Proceedings of the IEEE Computer Society Conference on Bioinformatics
PSIST: Indexing Protein Structures Using Suffix Trees

CSB '05 Proceedings of the 2005 IEEE Computational Systems Bioinformatics Conference
Non-sequential structure-based alignments reveal topology-independent core packing arrangements in proteins

Bioinformatics
Genome-scale disk-based suffix tree indexing

Proceedings of the 2007 ACM SIGMOD international conference on Management of data
OASIS: an online and accurate technique for local-alignment searches on biological sequences

VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
Practical suffix tree construction

VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30

Quantified Score

Hi-index	0.00

Visualization

Abstract

Approaches for indexing proteins and for fast and scalable searching for structures similar to a query structure have important applications such as protein structure and function prediction, protein classification and drug discovery. In this paper, we develop a new method for extracting local structural (or geometric) features from protein structures. These feature vectors are in turn converted into a set of symbols, which are then indexed using a suffix tree. For a given query, the suffix tree index can be used effectively to retrieve the maximal matches, which are then chained to obtain the local alignments. Finally, similar proteins are retrieved by their alignment score against the query. Our results show classification accuracy up to 50% and 92.9% at the topology and class level according to the CATH classification. These results outperform the best previous methods. We also show that PSIST is highly scalable due to the external suffix tree indexing approach it uses; it is able to index about 70,500 domains from SCOP in under an hour.