Improving CUDASW++, a Parallelization of Smith-Waterman for CUDA Enabled Devices

  • Authors:
  • Doug Hains;Zach Cashero;Mark Ottenberg;Wim Bohm;Sanjay Rajopadhye

  • Affiliations:
  • -;-;-;-;-

  • Venue:
  • IPDPSW '11 Proceedings of the 2011 IEEE International Symposium on Parallel and Distributed Processing Workshops and PhD Forum
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

CUDASW++ is a parallelization of the Smith-Waterman algorithm for CUDA graphical processing units that computes the similarity scores of a query sequence paired with each sequence in a database. The algorithm uses one of two kernel functions to compute the score between a given pair of sequences: the inter-task kernel or the intra-task kernel. We have identified the intra-task kernel as a major bottleneck in the CUDASW++ algorithm. We have developed a new intra-task kernel that is faster than the original intra-task kernel used in CUDASW++. We describe the development of our kernel as a series of incremental changes that provide insight into a number of issues that must be considered when developing any algorithm for the CUDA architecture. We analyze the performance of our kernel compared to the original and show that the use of our intra-task kernel substantially improves the overall performance of CUDASW++ on the order of three to four giga-cell updates per second on various benchmark databases.