The input/output complexity of sorting and related problems
Communications of the ACM
Dynamic programming on two-dimensional systolic arrays
Information Processing Letters
POPL '88 Proceedings of the 15th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Data dependence and program restructuring
The Journal of Supercomputing
Tiling multidimensional iteration spaces for nonshared memory machines
Proceedings of the 1991 ACM/IEEE conference on Supercomputing
Parallel algorithms for dynamic programming recurrences with more than O(1) dependency
Journal of Parallel and Distributed Computing
Tile size selection using cache organization and data layout
PLDI '95 Proceedings of the ACM SIGPLAN 1995 conference on Programming language design and implementation
A model and compilation strategy for out-of-core data parallel programs
PPOPP '95 Proceedings of the fifth ACM SIGPLAN symposium on Principles and practice of parallel programming
Optimization of an RNA folding algorithm for parallel architectures
Parallel Computing
Reuse-driven tiling for improving data locality
International Journal of Parallel Programming
External memory algorithms and data structures: dealing with massive data
ACM Computing Surveys (CSUR)
Optimal tiling for the RNA base pairing problem
Proceedings of the fourteenth annual ACM symposium on Parallel algorithms and architectures
Iteration Space Tiling for Memory Hierarchies
Proceedings of the Third SIAM Conference on Parallel Processing for Scientific Computing
I/O complexity: The red-blue pebble game
STOC '81 Proceedings of the thirteenth annual ACM symposium on Theory of computing
TiNy Threads: A Thread Virtual Machine for the Cyclops64 Cellular Architecture
IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Workshop 14 - Volume 15
Load Balancing Algorithm in Cluster-based RNA secondary structure Prediction
ISPDC '05 Proceedings of the The 4th International Symposium on Parallel and Distributed Computing
Toward a Software Infrastructure for the Cyclops-64 Cellular Architecture
HPCS '06 Proceedings of the 20th International Symposium on High-Performance Computing in an Advanced Collaborative Environment
A Parallel, Out-of-Core Algorithm for RNA Secondary Structure Prediction
ICPP '06 Proceedings of the 2006 International Conference on Parallel Processing
Locality and parallelism optimization for dynamic programming algorithm in bioinformatics
Proceedings of the 2006 ACM/IEEE conference on Supercomputing
Design and implementation of the HPCS graph analysis benchmark on symmetric multiprocessors
HiPC'05 Proceedings of the 12th international conference on High Performance Computing
Proceedings of the 34th annual international symposium on Computer architecture
Cache-efficient dynamic programming algorithms for multicores
Proceedings of the twentieth annual symposium on Parallelism in algorithms and architectures
Parallelizing query optimization
Proceedings of the VLDB Endowment
Quantitative analysis of sequence alignment applications on multiprocessor architectures
Proceedings of the 6th ACM conference on Computing frontiers
Dependency-aware reordering for parallelizing query optimization in multi-core CPUs
Proceedings of the 2009 ACM SIGMOD International Conference on Management of data
Characterizing and Understanding the Bandwidth Behavior of Workloads on Multi-core Processors
Euro-Par '09 Proceedings of the 15th International Euro-Par Conference on Parallel Processing
Lock-free parallel dynamic programming
Journal of Parallel and Distributed Computing
A coarse-grained parallel algorithm for the matrix chain order problem
Proceedings of the 2012 Symposium on High Performance Computing
Proceedings of the 51st ACM Southeast Conference
Efficient programming paradigm for video streaming processing on TILE64 platform
The Journal of Supercomputing
Hi-index | 0.00 |
Dynamic programming is an efficient technique to solve combinatorial search and optimization problem. There have been many parallel dynamic programming algorithms. The purpose of this paper is to study a family of dynamic programming algorithm where data dependence appear between non-consecutive stages, in other words, the data dependence is non-uniform. This kind of dynnamic programming is typically called nonserial polyadic dynamic programming. Owing to the non-uniform data dependence, it is harder to optimize this problem for parallelism and locality on parallel architectures. In this paper, we address the chanllenge of exploiting fine grain parallelism and locality of nonserial polyadic dynamic programming on a multi-core architecture. We present a programming and execution model for multi-core architectures with memory hierarchy. In the framework of the new model, the parallelism and locality benifit from a data dependence transformation. We propose a parallel pipelined algorithm for filling the dynamic programming matrix by decomposing the computation operators. The new parallel algorithm tolerates the memory access latency using multi-thread and is easily improved with tile technique. We formulate and analytically solve the optimization problem determing the tile size that minimizes the total execution time. The experiments on a simulator give a validation of the proposed model and show that the fine grain parallel algorithm achieves sub-linear speedup and that a potential high scalability on multi-core arichitecture.