Multi-parallel prefiltering on the convey HC-1 for supporting homology detection

  • Authors:
  • Fabian Nowak;Michael Bromberger;Martin Schindewolf;Wolfgang Karl

  • Affiliations:
  • Karlsruhe Institute of Technology (KIT), Karlsruhe, Germany;Karlsruhe Institute of Technology (KIT), Karlsruhe, Germany;Karlsruhe Institute of Technology (KIT), Karlsruhe, Germany;Karlsruhe Institute of Technology (KIT), Karlsruhe, Germany

  • Venue:
  • Proceedings of the 20th European MPI Users' Group Meeting
  • Year:
  • 2013

Quantified Score

Hi-index 0.00

Visualization

Abstract

Gene databases used in research are huge and still grow at a fast pace. Many comparisons need to be done when searching similar (homologous) sequences in these databases for a given query sequence. Therefore, highly parallel architectures and much bandwidth are required for handling processing and transferring massive amounts of data. The Convey HC-1 with four FPGAs and high memory bandwidth of up to 76.8 GB/s seems very suitable for supporting this task as other bioinformatics applications have already been greatly supported by the HC-1. We research accelerating an application for searching homologous sequences. Limited by FPGA size only, we present a design that calculates 3 prefiltering scores per FPGA concurrently, i.e. 12 calculations in total. This score calculation for database sequences against the query profile is done by a modified Smith-Waterman scheme that is internally parallelized 16*8=128 times in contrast to the SSE implementation where only 16-fold parallelism can be exploited and where memory bandwidth poses the limiting factor. Preloading the query profile, we are able to transform the memory-bound SSE implementation to a compute-bound FPGA design which is only limited by FPGA size. Despite much lower clock rates, the FPGAs outperform SSE for the calculation of the prefiltering scores by a factor of 4.46. We achieve application speedup of 1.79 against the original, unmodified state-of-the-art SSE-based implementation because the score calculation accounts for less than 63% of the application runtime.