On Integrating Peptide Sequence Analysis and Relational Distance-Based Indexing

Authors:
Weijia Xu;Rui Mao;Shu Wang;Daniel P. Miranker
Affiliations:
University of Texas at Austin;University of Texas at Austin;University of Texas at Austin;University of Texas at Austin
Venue:
BIBE '06 Proceedings of the Sixth IEEE Symposium on BionInformatics and BioEngineering
Year:
2006

Citing 0
Cited 2

Covariant Evolutionary Event Analysis for Base Interaction Prediction Using a Relational Database Management System for RNA

SSDBM 2009 Proceedings of the 21st International Conference on Scientific and Statistical Database Management
Empirical evaluation of excluded middle vantage point forest on biological sequences workload

Proceedings of the 1st Workshop on New Trends in Similarity Search

Quantified Score

Hi-index	0.00

Visualization

Abstract

Managing data with distance-based indexing methods has the potential to provide scalability and integration with relational database management systems and the SQL programming model. We previously demonstrated the advantages of such an approach for nucleotide sequences using Hamming distance (mismatch). However, the larger alphabet size of peptide sequences increases the dimensionality of the problem, making algorithmic results more challenging. The development of a metric-PAM substitution matrix enables metric-distance based indexing for peptide sequences. The performance of distance-based indexing for homologous protein retrieval entails trade-off among accuracy, scalability and computational cost. We investigate the application of the multi-vantage point (MVP) tree algorithm to index peptide k-mers based on global mPAM alignment. We show that k-mer retrieval can still maintain accuracy when k is at least as large as 6 that creates a domain of over 60 million key values and enables scalability sufficient for effective performance on large disk-resident sequence databases.