Searching in Parallel for Similar Strings

  • Authors:
  • Isidore Rigoutsos;Andrea Califano

  • Affiliations:
  • -;-

  • Venue:
  • IEEE Computational Science & Engineering
  • Year:
  • 1994

Quantified Score

Hi-index 0.00

Visualization

Abstract

Distributed computation, probabilistic indexing and hashing techniques combine to create a novel approach to processing very large biological-sequence databases. Other data-intensive tasks could also benefit. Our indexing-based approach enables fast similarity searching through a large database of strings. Thanks to a redundant table-lookup scheme, recovering database items that match a test sequence requires minimal data access. We have implemented a uniprocessor version of this approach called Flash (Fast Lookup Algorithm for String Homology) as well as a distributed version, dFlash, using a cluster of seven non-dedicated workstations connected through a local area network. In this article, we present an approach for retrieving homologies in databases of proteins.