Rapid Homology Search with Neighbor Seeds

Authors:
Miklos Csuros;Bin Ma
Affiliations:
Department of Computer Science and Operations Research, Universite de Montreal, C.P. 6128, succ. Centre-Ville, Montreal, Quebec, H3C 3J7, Canada;Department of Computer Science, University of Western Ontario, London, Ontario, N6A 5B7, Canada
Venue:
Algorithmica
Year:
2007

Citing 0
Cited 2

Amino Acid Classification and Hash Seeds for Homology Search

BICoB '09 Proceedings of the 1st International Conference on Bioinformatics and Computational Biology
On Subset Seeds for Protein Alignment

IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)

Quantified Score

Hi-index	0.00

Visualization

Abstract

Using a seed to rapidly "hit" possible homologies for further scrutiny is a common practice to speed up homology search in molecular sequences. It was shown that a collection of higher weight seeds have better sensitivity than a single lower weight seed at the same speed. However, huge memory requirements diminish the advantages of high weight seeds. This paper describes a two-stage extension method, which simulates high weight seeds with modest memory requirements. By drawing upon the previously studied ideas of vector seed and multiple seeds, we introduce neighbor seeds, which are implemented using two-stage extension. Neighbor seeds provide the flexibility to maximize the independence between the seeds, which is a well-known criterion for optimizing sensitivity. A major advantage of neighbor seeds is that they all rely on the same pre-built database index. Based on considerations of sensitivity and biological adequacy, different neighbor seeds can be used for different queries without rebuilding the index. The paper also discusses some other practical techniques to reduce memory usage.