POPL '88 Proceedings of the 15th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Tiling multidimensional iteration spaces for nonshared memory machines
Proceedings of the 1991 ACM/IEEE conference on Supercomputing
Parallel algorithms for dynamic programming recurrences with more than O(1) dependency
Journal of Parallel and Distributed Computing
Virtual memory mapped network interface for the SHRIMP multicomputer
ISCA '94 Proceedings of the 21st annual international symposium on Computer architecture
Tile size selection using cache organization and data layout
PLDI '95 Proceedings of the ACM SIGPLAN 1995 conference on Programming language design and implementation
Hitting the memory wall: implications of the obvious
ACM SIGARCH Computer Architecture News
Unimodular transformations of non-perfectly nested loops
Parallel Computing
Communication-minimal tiling of uniform dependence loops
Journal of Parallel and Distributed Computing
Optimal orthogonal tiling of 2-D iterations
Journal of Parallel and Distributed Computing
ICS '98 Proceedings of the 12th international conference on Supercomputing
Co-array Fortran for parallel programming
ACM SIGPLAN Fortran Forum
Optimization of an RNA folding algorithm for parallel architectures
Parallel Computing
Reuse-driven tiling for improving data locality
International Journal of Parallel Programming
Optimal tiling for the RNA base pairing problem
Proceedings of the fourteenth annual ACM symposium on Parallel algorithms and architectures
The Virtual Interface Architecture
IEEE Micro
Iteration Space Tiling for Memory Hierarchies
Proceedings of the Third SIAM Conference on Parallel Processing for Scientific Computing
FOCS '99 Proceedings of the 40th Annual Symposium on Foundations of Computer Science
A blocked all-pairs shortest-paths algorithm
Journal of Experimental Algorithmics (JEA)
Optimizing Graph Algorithms for Improved Cache Performance
IEEE Transactions on Parallel and Distributed Systems
Workload Characterization of Bioinformatics Applications
MASCOTS '05 Proceedings of the 13th IEEE International Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunication Systems
Transformations to Parallel Codes for Communication-Computation Overlap
SC '05 Proceedings of the 2005 ACM/IEEE conference on Supercomputing
High performance RDMA-based MPI implementation over infiniBand
International Journal of Parallel Programming - Special issue I: The 17th annual international conference on supercomputing (ICS'03)
BioBench: A Benchmark Suite of Bioinformatics Applications
ISPASS '05 Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software, 2005
An experimental study of optimizing bioinformatics applications
IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
IEEE Transactions on Parallel and Distributed Systems
A parallel dynamic programming algorithm on a multi-core architecture
Proceedings of the nineteenth annual ACM symposium on Parallel algorithms and architectures
Parallelizing query optimization
Proceedings of the VLDB Endowment
Languages and Compilers for Parallel Computing
Dependency-aware reordering for parallelizing query optimization in multi-core CPUs
Proceedings of the 2009 ACM SIGMOD International Conference on Management of data
Hi-index | 0.00 |
Dynamic programming has been one of the most efficient approaches to sequence analysis and structure prediction in biology. However, their performance is limited due to the drastic increase in both the number of biological data and variety of the computer architectures. With regard to such predicament, this paper creates excellent algorithms aimed at addressing the challenges of improving memory efficiency and network latency tolerance for nonserial polyadic dynamic programming where the dependences are nonuniform. By relaxing the nonuniform dependences, we proposed a new cache oblivious scheme to enhance its performance on memory hierarchy architectures. Moreover we develop and extend a tiling technique to parallelize this nonserial polyadic dynamic programming using an alternate block-cyclic mapping strategy for balancing the computational and memory load, where an analytical parameterized model is formulated to determine the tile volume size that minimizes the total execution time and an algorithmic transformation is used to schedule the tile to overlap communication with computation to further minimize communication overhead on parallel architectures. The numerical experiments were carried out on several high performance computer systems. The new cache-oblivious dynamic programming algorithm achieve 2-10 speedup and the parallel tiling algorithm with communication-computation overlapping shows a desired potential for fine-grained parallel computing on massively parallel computer systems.