PSIST: A scalable approach to indexing protein structures using suffix trees

  • Authors:
  • Feng Gao;Mohammed J. Zaki

  • Affiliations:
  • Department of Computer Science, Rensselaer Polytechnic Institute, 110 8th Street, Troy, NY 12180, USA;Department of Computer Science, Rensselaer Polytechnic Institute, 110 8th Street, Troy, NY 12180, USA

  • Venue:
  • Journal of Parallel and Distributed Computing
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

Approaches for indexing proteins and for fast and scalable searching for structures similar to a query structure have important applications such as protein structure and function prediction, protein classification and drug discovery. In this paper, we develop a new method for extracting local structural (or geometric) features from protein structures. These feature vectors are in turn converted into a set of symbols, which are then indexed using a suffix tree. For a given query, the suffix tree index can be used effectively to retrieve the maximal matches, which are then chained to obtain the local alignments. Finally, similar proteins are retrieved by their alignment score against the query. Our results show classification accuracy up to 50% and 92.9% at the topology and class level according to the CATH classification. These results outperform the best previous methods. We also show that PSIST is highly scalable due to the external suffix tree indexing approach it uses; it is able to index about 70,500 domains from SCOP in under an hour.