Efficient Index Structures for String Databases
Proceedings of the 27th International Conference on Very Large Data Bases
BLAST
Geometric suffix tree: Indexing protein 3-D structures
Journal of the ACM (JACM)
A survey of practical algorithms for suffix tree construction in external memory
Software—Practice & Experience
IEEE Transactions on Information Technology in Biomedicine
Hi-index | 0.00 |
Finding similarities in protein sequences is a core problem in bioinformatics. It represents the first step in the functional characterization of novel protein sequences, and is also employed in protein evolution studies and for predicting biological structure. In this paper, we propose Proteinus, a new index aimed at similarity search of protein sequences. Proteinus is characterized by using a reduced amino acid alphabet to represent protein sequences and also by providing a persistent storage of the index on disk, as well as by allowing the execution of range queries. Performance tests with real-world protein sequences showed that the Proteinus index was very efficient. Compared with the BLASTP tool, Proteinus provided an impressive performance gain from 45% up to 93% for range query processing.