Parallel Out-of-Core Algorithm for Genome-Scale Enumeration of Metabolic Systemic Pathways
IPDPS '02 Proceedings of the 16th International Parallel and Distributed Processing Symposium
Double Description Method Revisited
Selected papers from the 8th Franco-Japanese and 4th Franco-Chinese Conference on Combinatorics and Computer Science
Sourcebook of parallel computing
Sourcebook of parallel computing
Improving the computational intensity of unstructured mesh applications
Proceedings of the 19th annual international conference on Supercomputing
Solving path problems on the GPU
Parallel Computing
Parallel extreme ray and pathway computation
PPAM'09 Proceedings of the 8th international conference on Parallel processing and applied mathematics: Part II
IPDPSW '11 Proceedings of the 2011 IEEE International Symposium on Parallel and Distributed Processing Workshops and PhD Forum
Accelerating the computation of elementary modes using pattern trees
WABI'06 Proceedings of the 6th international conference on Algorithms in Bioinformatics
Applications of Heterogeneous Computing in Computational and Simulation Science
UCC '11 Proceedings of the 2011 Fourth IEEE International Conference on Utility and Cloud Computing
LU Factorization with Partial Pivoting for a Multicore System with Accelerators
IEEE Transactions on Parallel and Distributed Systems
Hi-index | 0.00 |
Elementary Flux Modes (EFMs) can be used to characterize functional cellular networks and have gained importance in systems biology. Enumeration of EFMs is a compute-intensive problem due to the combinatorial explosion in candidate generation. While there exist parallel implementations for shared-memory SMP and distributed memory architectures, tools supporting heterogeneous platforms have not yet been developed. Here we propose and evaluate a heterogeneous implementation of combinatorial candidate generation that employs GPUs as accelerators. It uses a 3-stage pipeline based method to manage arithmetic intensity. Our implementation results in a 6x speedup over the serial implementation, and a 1.8x speedup over a multithreaded implementation for CPU-only SMP architectures.