Optimization schemes and performance evaluation of Smith–Waterman algorithm on CPU, GPU and FPGA

Authors:
Dan Zou;Yong Dou;Fei Xia
Affiliations:
Department of Computer Science, National University of Defense Technology, Changsha 410073, China;Department of Computer Science, National University of Defense Technology, Changsha 410073, China;Department of Computer Science, National University of Defense Technology, Changsha 410073, China
Venue:
Concurrency and Computation: Practice & Experience
Year:
2012

Citing 12
Cited 0

Microparallelism and high-performance protein matching

Supercomputing '95 Proceedings of the 1995 ACM/IEEE conference on Supercomputing
Gene Matching Using JBits

FPL '02 Proceedings of the Reconfigurable Computing Is Going Mainstream, 12th International Conference on Field-Programmable Logic and Applications
A Systolic Array for the Sequence Alignment Problem

A Systolic Array for the Sequence Alignment Problem
Hyper customized processors for bio-sequence database scanning on FPGAs

Proceedings of the 2005 ACM/SIGDA 13th international symposium on Field-programmable gate arrays
Striped Smith--Waterman speeds database searches six times over other SIMD implementations

Bioinformatics
Families of FPGA-based accelerators for approximate string matching

Microprocessors & Microsystems
Design and Implementation of a Highly Parameterised FPGA-Based Skeleton for Pairwise Biological Sequence Alignment

FCCM '07 Proceedings of the 15th Annual IEEE Symposium on Field-Programmable Custom Computing Machines
Streamware: programming general-purpose multicore processors using streams

Proceedings of the 13th international conference on Architectural support for programming languages and operating systems
Applying SIMD approach to whole genome comparison on commodity hardware

PPAM'07 Proceedings of the 7th international conference on Parallel processing and applied mathematics
Debunking the 100X GPU vs. CPU myth: an evaluation of throughput computing on CPU and GPU

Proceedings of the 37th annual international symposium on Computer architecture
BLAS Comparison on FPGA, CPU and GPU

ISVLSI '10 Proceedings of the 2010 IEEE Annual Symposium on VLSI
A highly parameterized and efficient FPGA-based skeleton for pairwise biological sequence alignment

IEEE Transactions on Very Large Scale Integration (VLSI) Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

With fierce competition between CPU and graphics processing unit (GPU) platforms, performance evaluation has become the focus of various sectors. In this paper, we take a well-known algorithm in the field of biosequence matching and database searching, the Smith–Waterman (S-W) algorithm as an example, and demonstrate approaches that fully exploit its performance potentials on CPU, GPU, and field-programmable gate array (FPGA) computing platforms. For CPU platforms, we perform two optimizations, single instruction, multiple data and multithread, with compiler options, to gain over 70 × speedups over naive CPU versions on quad-core CPU platforms. For GPU platforms, we propose the combination of coalesced global memory accesses, shared memory tiles, and loop unfolding, achieving 50 × speedups over initial GPU versions on an NVIDIA GeForce GTX 470 card. Experimental results show that the GPU GTX 470 gains 12 × speedups, instead of 100 × reported by some studies, over Intel quadcore CPU Q9400, under the same manufacturing technology and both with fully optimized schemes. In addition, for FPGA platforms, we customize a linear systolic array for the S-W algorithm in a 45-nm FPGA chip from Xilinx (XC6VLX760), with up to 1024 processing elements. Under only 133 MHz clock rate, the FPGA platform reaches the highest performance and becomes the most power-efficient platform, using only 25 W compared with 190 W of the GPU GTX 470. Copyright © 2011 John Wiley & Sons, Ltd.