OASIS: an online and accurate technique for local-alignment searches on biological sequences

  • Authors:
  • Colin Meek;Jignesh M. Patel;Shruti Kasetty

  • Affiliations:
  • University of Michigan, Ann Arbor, MI;University of Michigan, Ann Arbor, MI;University of Michigan, Ann Arbor, MI

  • Venue:
  • VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
  • Year:
  • 2003

Quantified Score

Hi-index 0.00

Visualization

Abstract

A common query against large protein and gene sequence data sets is to locate targets that are similar to an input query sequence. The current set of popular search tools, such as BLAST, employ heuristics to improve the speed of such searches. However, such heuristics can sometimes miss targets, which in many cases is undesirable. The alternative to BLAST is to use an accurate algorithm, such as the Smith-Waterman (S-W) algorithm. However, these accurate algorithms are computationally very expensive, which limits their use in practice. This paper takes on the challenge of designing an accurate and efficient algorithm for evaluating local-alignment searches. To meet this goal, we propose a novel search algorithm, called OASIS. This algorithm employs a dynamic programming A*-search driven by a suffix-tree index that is built on the input data set. We experimentally evaluate OASIS and demonstrate that for an important class of searches, in which the query sequence lengths are small, OASIS is more than an order of magnitude faster than S-W. In addition, the speed of OASIS is comparable to BLAST. Furthermore, OASIS returns results in decreasing order of the matching score, making it possible to use OASIS in an online setting. Consequently, we believe that it may now be practically feasible to query large biological sequence data sets using an accurate local-alignment search algorithm.