Efficiently querying protein sequences with the proteinus index

Authors:
Felipe Alves da Louza;Ricardo Rodrigues Ciferri;Cristina Dutra de Aguiar Ciferri
Affiliations:
Department of Computer Science, University of São Paulo, São Carlos, SP, Brasil;Department of Computer Science, Federal University of São Carlos, São Carlos, SP, Brasil;Department of Computer Science, University of São Paulo, São Carlos, SP, Brasil
Venue:
BSB'11 Proceedings of the 6th Brazilian conference on Advances in bioinformatics and computational biology
Year:
2011

Citing 5
Cited 0

Efficient Index Structures for String Databases

Proceedings of the 27th International Conference on Very Large Data Bases
BLAST

BLAST
Geometric suffix tree: Indexing protein 3-D structures

Journal of the ACM (JACM)
A survey of practical algorithms for suffix tree construction in external memory

Software—Practice & Experience
Searching protein 3-D structures for optimal structure alignment using intelligent algorithms and data structures

IEEE Transactions on Information Technology in Biomedicine

Quantified Score

Hi-index	0.00

Visualization

Abstract

Finding similarities in protein sequences is a core problem in bioinformatics. It represents the first step in the functional characterization of novel protein sequences, and is also employed in protein evolution studies and for predicting biological structure. In this paper, we propose Proteinus, a new index aimed at similarity search of protein sequences. Proteinus is characterized by using a reduced amino acid alphabet to represent protein sequences and also by providing a persistent storage of the index on disk, as well as by allowing the execution of range queries. Performance tests with real-world protein sequences showed that the Proteinus index was very efficient. Compared with the BLASTP tool, Proteinus provided an impressive performance gain from 45% up to 93% for range query processing.