Efficiently querying protein sequences with the proteinus index

  • Authors:
  • Felipe Alves da Louza;Ricardo Rodrigues Ciferri;Cristina Dutra de Aguiar Ciferri

  • Affiliations:
  • Department of Computer Science, University of São Paulo, São Carlos, SP, Brasil;Department of Computer Science, Federal University of São Carlos, São Carlos, SP, Brasil;Department of Computer Science, University of São Paulo, São Carlos, SP, Brasil

  • Venue:
  • BSB'11 Proceedings of the 6th Brazilian conference on Advances in bioinformatics and computational biology
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

Finding similarities in protein sequences is a core problem in bioinformatics. It represents the first step in the functional characterization of novel protein sequences, and is also employed in protein evolution studies and for predicting biological structure. In this paper, we propose Proteinus, a new index aimed at similarity search of protein sequences. Proteinus is characterized by using a reduced amino acid alphabet to represent protein sequences and also by providing a persistent storage of the index on disk, as well as by allowing the execution of range queries. Performance tests with real-world protein sequences showed that the Proteinus index was very efficient. Compared with the BLASTP tool, Proteinus provided an impressive performance gain from 45% up to 93% for range query processing.