Locality and parallelism optimization for dynamic programming algorithm in bioinformatics

Authors:
Guangming Tan;Shengzhong Feng;Ninghui Sun
Affiliations:
Chinese Academy of Sciences and Graduate School of Chinese Academy of Sciences;Chinese Academy of Sciences;Chinese Academy of Sciences
Venue:
Proceedings of the 2006 ACM/IEEE conference on Supercomputing
Year:
2006

Citing 28
Cited 4

Supernode partitioning

POPL '88 Proceedings of the 15th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Tiling multidimensional iteration spaces for nonshared memory machines

Proceedings of the 1991 ACM/IEEE conference on Supercomputing
Parallel algorithms for dynamic programming recurrences with more than O(1) dependency

Journal of Parallel and Distributed Computing
Virtual memory mapped network interface for the SHRIMP multicomputer

ISCA '94 Proceedings of the 21st annual international symposium on Computer architecture
Tile size selection using cache organization and data layout

PLDI '95 Proceedings of the ACM SIGPLAN 1995 conference on Programming language design and implementation
Hitting the memory wall: implications of the obvious

ACM SIGARCH Computer Architecture News
Unimodular transformations of non-perfectly nested loops

Parallel Computing
Communication-minimal tiling of uniform dependence loops

Journal of Parallel and Distributed Computing
Optimal orthogonal tiling of 2-D iterations

Journal of Parallel and Distributed Computing
The design and implementation of zero copy MPI using commodity hardware with a high performance network

ICS '98 Proceedings of the 12th international conference on Supercomputing
Co-array Fortran for parallel programming

ACM SIGPLAN Fortran Forum
Optimization of an RNA folding algorithm for parallel architectures

Parallel Computing
Reuse-driven tiling for improving data locality

International Journal of Parallel Programming
Optimal tiling for the RNA base pairing problem

Proceedings of the fourteenth annual ACM symposium on Parallel algorithms and architectures
Maximizing Multiprocessor Performance with the SUIF Compiler

Computer
Myrinet: A Gigabit-per-Second Local Area Network

IEEE Micro
The Virtual Interface Architecture

IEEE Micro
The Quadrics Network: High-Performance Clustering Technology

IEEE Micro
Iteration Space Tiling for Memory Hierarchies

Proceedings of the Third SIAM Conference on Parallel Processing for Scientific Computing
Cache-Oblivious Algorithms

FOCS '99 Proceedings of the 40th Annual Symposium on Foundations of Computer Science
A blocked all-pairs shortest-paths algorithm

Journal of Experimental Algorithmics (JEA)
Optimizing Graph Algorithms for Improved Cache Performance

IEEE Transactions on Parallel and Distributed Systems
Workload Characterization of Bioinformatics Applications

MASCOTS '05 Proceedings of the 13th IEEE International Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunication Systems
Transformations to Parallel Codes for Communication-Computation Overlap

SC '05 Proceedings of the 2005 ACM/IEEE conference on Supercomputing
High performance RDMA-based MPI implementation over infiniBand

International Journal of Parallel Programming - Special issue I: The 17th annual international conference on supercomputing (ICS'03)
BioBench: A Benchmark Suite of Bioinformatics Applications

ISPASS '05 Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software, 2005
An experimental study of optimizing bioinformatics applications

IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
Optimal semi-oblique tiling

IEEE Transactions on Parallel and Distributed Systems

A parallel dynamic programming algorithm on a multi-core architecture

Proceedings of the nineteenth annual ACM symposium on Parallel algorithms and architectures
Parallelizing query optimization

Proceedings of the VLDB Endowment
Just-In-Time Locality and Percolation for Optimizing Irregular Applications on a Manycore Architecture

Languages and Compilers for Parallel Computing
Dependency-aware reordering for parallelizing query optimization in multi-core CPUs

Proceedings of the 2009 ACM SIGMOD International Conference on Management of data

Quantified Score

Hi-index	0.00

Visualization

Abstract

Dynamic programming has been one of the most efficient approaches to sequence analysis and structure prediction in biology. However, their performance is limited due to the drastic increase in both the number of biological data and variety of the computer architectures. With regard to such predicament, this paper creates excellent algorithms aimed at addressing the challenges of improving memory efficiency and network latency tolerance for nonserial polyadic dynamic programming where the dependences are nonuniform. By relaxing the nonuniform dependences, we proposed a new cache oblivious scheme to enhance its performance on memory hierarchy architectures. Moreover we develop and extend a tiling technique to parallelize this nonserial polyadic dynamic programming using an alternate block-cyclic mapping strategy for balancing the computational and memory load, where an analytical parameterized model is formulated to determine the tile volume size that minimizes the total execution time and an algorithmic transformation is used to schedule the tile to overlap communication with computation to further minimize communication overhead on parallel architectures. The numerical experiments were carried out on several high performance computer systems. The new cache-oblivious dynamic programming algorithm achieve 2-10 speedup and the parallel tiling algorithm with communication-computation overlapping shows a desired potential for fine-grained parallel computing on massively parallel computer systems.