Parallel reconstruction of neighbor-joining trees for large multiple sequence alignments using CUDA

Authors:
Yongchao Liu;Bertil Schmidt;Douglas L. Maskell
Affiliations:
School of Computer Engineering, Nanyang Technological University, Singapore 639798;School of Computer Engineering, Nanyang Technological University, Singapore 639798;School of Computer Engineering, Nanyang Technological University, Singapore 639798
Venue:
IPDPS '09 Proceedings of the 2009 IEEE International Symposium on Parallel&Distributed Processing
Year:
2009

Citing 0
Cited 3

GPU parallelization of algebraic dynamic programming

PPAM'09 Proceedings of the 8th international conference on Parallel processing and applied mathematics: Part II
GPU-based high throughput multiple sequence alignment algorithm for protein data: a preliminary study

Proceedings of the ATIP/A*CRC Workshop on Accelerator Technologies for High-Performance Computing: Does Asia Lead the Way?
Frequency-based re-sequencing tool for short reads on graphics processing units

International Journal of Computational Science and Engineering

Quantified Score

Hi-index	0.00

Visualization

Abstract

Computing large multiple protein sequence alignments using progressive alignment tools such as ClustalW requires several hours on state-of-the-art workstations. ClustalW uses a three-stage processing pipeline: (i) pairwise distance computation; (ii) phylogenetic tree reconstruction; and (iii) progressive multiple alignment computation. Previous work on accelerating ClustalW was mainly focused on parallelizing the first stage and achieved good speedups for a few hundred input sequences. However, if the input size grows to several thousand sequences, the second stage can dominate the overall runtime. In this paper, we present a new approach to accelerating this second stage using graphics processing units (GPUs). In order to derive an efficient mapping onto the GPU architecture, we present a parallelization of the neighbor-joining tree reconstruction algorithm using CUDA. Our experimental results show speedups of over 26脳 for large datasets compared to the sequential implementation.