Rapid Homology Search with Neighbor Seeds

  • Authors:
  • Miklos Csuros;Bin Ma

  • Affiliations:
  • Department of Computer Science and Operations Research, Universite de Montreal, C.P. 6128, succ. Centre-Ville, Montreal, Quebec, H3C 3J7, Canada;Department of Computer Science, University of Western Ontario, London, Ontario, N6A 5B7, Canada

  • Venue:
  • Algorithmica
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

Using a seed to rapidly "hit" possible homologies for further scrutiny is a common practice to speed up homology search in molecular sequences. It was shown that a collection of higher weight seeds have better sensitivity than a single lower weight seed at the same speed. However, huge memory requirements diminish the advantages of high weight seeds. This paper describes a two-stage extension method, which simulates high weight seeds with modest memory requirements. By drawing upon the previously studied ideas of vector seed and multiple seeds, we introduce neighbor seeds, which are implemented using two-stage extension. Neighbor seeds provide the flexibility to maximize the independence between the seeds, which is a well-known criterion for optimizing sensitivity. A major advantage of neighbor seeds is that they all rely on the same pre-built database index. Based on considerations of sensitivity and biological adequacy, different neighbor seeds can be used for different queries without rebuilding the index. The paper also discusses some other practical techniques to reduce memory usage.