Partitioning Problems in Parallel, Pipeline, and Distributed Computing
IEEE Transactions on Computers
Iterative Algorithms for Solution of Large Sparse Systems of Linear Equations on Hypercubes
IEEE Transactions on Computers
Combinatorial algorithms for integrated circuit layout
Combinatorial algorithms for integrated circuit layout
Structured partitioning problems
Operations Research
Improved Algorithms for Mapping Pipelined and Parallel Computations
IEEE Transactions on Computers
Optimal algorithms for tree partitioning
SODA '91 Proceedings of the second annual ACM-SIAM symposium on Discrete algorithms
Improved Algorithms for Partitioning Problems in Parallel, Pipelined, and Distributed Computing
IEEE Transactions on Computers
Mapping a chain task to chained processors
Information Processing Letters
Introduction to parallel computing: design and analysis of algorithms
Introduction to parallel computing: design and analysis of algorithms
A Sorting Classification of Parallel Rendering
IEEE Computer Graphics and Applications
Rectilinear partitioning of irregular data parallel computations
Journal of Parallel and Distributed Computing
The sort-first rendering architecture for high-performance graphics
I3D '95 Proceedings of the 1995 symposium on Interactive 3D graphics
Efficient Algorithms for a Class of Partitioning Problems
IEEE Transactions on Parallel and Distributed Systems
Optimal partitioning of sequences
Journal of Algorithms
Dynamic Partitioning of Non-Uniform Structured Workloads with Spacefilling Curves
IEEE Transactions on Parallel and Distributed Systems
Experimental evaluation of efficient sparse matrix distributions
ICS '96 Proceedings of the 10th international conference on Supercomputing
Parallelization techniques for sparse matrix applications
Journal of Parallel and Distributed Computing - Special issue on compilation techniques for distributed memory systems
Vienna-Fortran/HPF Extensions for Sparse and Irregular Problems and Their Compilation
IEEE Transactions on Parallel and Distributed Systems
Hypergraph-Partitioning-Based Decomposition for Parallel Sparse-Matrix Vector Multiplication
IEEE Transactions on Parallel and Distributed Systems
Image-Space Decomposition Algorithms for Sort-First Parallel Volume Rendering of Unstructured Grids
The Journal of Supercomputing
Partitioning Rectangular and Structurally Unsymmetric Sparse Matrices for Parallel Processing
SIAM Journal on Scientific Computing
Computers and Intractability: A Guide to the Theory of NP-Completeness
Computers and Intractability: A Guide to the Theory of NP-Completeness
Efficient Partitioning of Sequences
IEEE Transactions on Computers
HPCN Europe '97 Proceedings of the International Conference and Exhibition on High-Performance Computing and Networking
Mapping pipeline skeletons onto heterogeneous platforms
Journal of Parallel and Distributed Computing
A partitioning algorithm for block-diagonal matrices with overlap
Parallel Computing
Mapping Pipeline Skeletons onto Heterogeneous Platforms
ICCS '07 Proceedings of the 7th international conference on Computational Science, Part I: ICCS 2007
One-dimensional partitioning for heterogeneous systems: Theory and practice
Journal of Parallel and Distributed Computing
Multi-Criteria Scheduling of Pipeline Workflows (and Application To the JPEG Encoder)
International Journal of High Performance Computing Applications
On Two-Dimensional Sparse Matrix Partitioning: Models, Methods, and a Recipe
SIAM Journal on Scientific Computing
Two-constraint domain decomposition with Space Filling Curves
Parallel Computing
Highly scalable dynamic load balancing in the atmospheric modeling system COSMO-SPECS+FD4
PARA'10 Proceedings of the 10th international conference on Applied Parallel and Scientific Computing - Volume Part I
Load-balancing spatially located computations using rectangular partitions
Journal of Parallel and Distributed Computing
A survey of pipelined workflow scheduling: Models and algorithms
ACM Computing Surveys (CSUR)
Hi-index | 0.00 |
The one-dimensional decomposition of nonuniform workload arrays with optimal load balancing is investigated. The problem has been studied in the literature as the "chains-on-chains partitioning" problem. Despite the rich literature on exact algorithms, heuristics are still used in parallel computing community with the "hope" of good decompositions and the "myth" of exact algorithms being hard to implement and not runtime efficient. We show that exact algorithms yield significant improvements in load balance over heuristics with negligible overhead. Detailed pseudocodes of the proposed algorithms are provided for reproducibility. We start with a literature review and propose improvements and efficient implementation tips for these algorithms. We also introduce novel algorithms that are asymptotically and runtime efficient. Our experiments on sparse matrix and direct volume rendering datasets verify that balance can be significantly improved by using exact algorithms. The proposed exact algorithms are 100 times faster than a single sparse-matrix vector multiplication for 64-way decompositions on the average. We conclude that exact algorithms with proposed efficient implementations can effectively replace heuristics.