CUDA-MEME: Accelerating motif discovery in biological sequences using CUDA-enabled graphics processing units

Authors:
Yongchao Liu;Bertil Schmidt;Weiguo Liu;Douglas L. Maskell
Affiliations:
School of Computer Engineering, Nanyang Technological University, Singapore 639798, Singapore;School of Computer Engineering, Nanyang Technological University, Singapore 639798, Singapore;School of Computer Engineering, Nanyang Technological University, Singapore 639798, Singapore;School of Computer Engineering, Nanyang Technological University, Singapore 639798, Singapore
Venue:
Pattern Recognition Letters
Year:
2010

Citing 9
Cited 8

Unsupervised Learning of Multiple Motifs in Biopolymers Using Expectation Maximization

Machine Learning - Special issue on applications in molecular biology
Combinatorial Approaches to Finding Subtle Signals in DNA Sequences

Proceedings of the Eighth International Conference on Intelligent Systems for Molecular Biology
Discovering motifs in dna and protein sequences: the approximate common substring problem

Discovering motifs in dna and protein sequences: the approximate common substring problem
DWE: Discriminating Word Enumerator

Bioinformatics
Vocabulon: a dictionary model approach for reconstruction and localization of transcription factor binding sites

Bioinformatics
Scalable Parallel Programming with CUDA

Queue - GPU Computing
NVIDIA Tesla: A Unified Graphics and Computing Architecture

IEEE Micro
Massively Parallelized DNA Motif Search on the Reconfigurable Hardware Platform COPACOBANA

PRIB '08 Proceedings of the Third IAPR International Conference on Pattern Recognition in Bioinformatics
GPU-MEME: Using Graphics Hardware to Accelerate Motif Finding in DNA Sequences

PRIB '08 Proceedings of the Third IAPR International Conference on Pattern Recognition in Bioinformatics

Parallel Position Weight Matrices algorithms

Parallel Computing
Highly scalable ab initio genomic motif identification

Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis
Simultaneously learning DNA motif along with its position and sequence rank preferences through EM algorithm

RECOMB'12 Proceedings of the 16th Annual international conference on Research in Computational Molecular Biology
Cross-Approximate Entropy parallel computation on GPUs for biomedical signal analysis. Application to MEG recordings

Computer Methods and Programs in Biomedicine
Heterogeneous COS pricing of rainbow options

WHPCF '13 Proceedings of the 6th Workshop on High Performance Computational Finance
Frequency-based re-sequencing tool for short reads on graphics processing units

International Journal of Computational Science and Engineering
Efficient parallel algorithm for multiple sequence alignments with regular expression constraints on graphics processing units

International Journal of Computational Science and Engineering
A parallel cooperative team of multiobjective evolutionary algorithms for motif discovery

The Journal of Supercomputing

Quantified Score

Hi-index	0.10

Visualization

Abstract

Motif discovery in biological sequences is of prime importance and a major challenge in computational biology. Consequently, numerous motif discovery tools have been developed to date. However, the rapid growth of both genomic sequence and gene transcription data, establishes the need for the development of scalable motif discovery tools. An approach to improve the runtime of motif discovery by an order-of-magnitude without losing sensitivity is to employ emerging many-core architectures such as CUDA-enabled GPUs. In this paper, we present a highly parallel formulation and implementation of the MEME motif discovery algorithm using the CUDA programming model. To achieve high efficiency, we introduce two parallelization approaches: sequence-level and substring-level parallelization. Furthermore, a hybrid computing framework is described to take advantage of both CPU and GPU compute resources. Our performance evaluation on a GeForce GTX 280 GPU, results in average runtime speedups of 21.4 (19.3) for the starting point search and 20.5 (16.4) for the overall runtime using the OOPS (ZOOPS) motif search model. The runtime speedups of CUDA-MEME on a single GPU are also comparable to those of ParaMEME running on 16 CPU cores of a high-performance workstation cluster. In addition to the fast speed, CUDA-MEME has the capability of finding motif instances consistent with the sequential MEME.