Allocating Independent Subtasks on Parallel Processors
IEEE Transactions on Software Engineering
Guided self-scheduling: A practical scheduling scheme for parallel supercomputers
IEEE Transactions on Computers
The Influence of Different Workload Descriptions on a Heuristic Load Balancing Scheme
IEEE Transactions on Software Engineering
Factoring: a method for scheduling parallel loops
Communications of the ACM
The definition of dependence distance
ACM Transactions on Programming Languages and Systems (TOPLAS)
Load-sharing in heterogeneous systems via weighted factoring
Proceedings of the eighth annual ACM symposium on Parallel algorithms and architectures
Optimal orthogonal tiling of 2-D iterations
Journal of Parallel and Distributed Computing
Accurately Selecting Block Size at Runtime in Pipelined Parallel Programs
International Journal of Parallel Programming
Loop tiling for parallelism
Optimizing compilers for modern architectures: a dependence-based approach
Optimizing compilers for modern architectures: a dependence-based approach
Time-minimal tiling when rise is larger than zero
Parallel Computing
Trapezoid Self-Scheduling: A Practical Scheduling Scheme for Parallel Compilers
IEEE Transactions on Parallel and Distributed Systems
Runtime Support and Compilation Methods for User-Specified Irregular Data Distributions
IEEE Transactions on Parallel and Distributed Systems
Optimal Grain Size Computation for Pipelined Algorithms
Euro-Par '96 Proceedings of the Second International Euro-Par Conference on Parallel Processing - Volume I
Loop scheduling for heterogeneity
HPDC '95 Proceedings of the 4th IEEE International Symposium on High Performance Distributed Computing
Parallel Programming: Techniques and Applications Using Networked Workstations and Parallel Computers (2nd Edition)
Sparse Tiling for Stationary Iterative Methods
International Journal of High Performance Computing Applications
Distributed loop-scheduling schemes for heterogeneous computer systems: Research Articles
Concurrency and Computation: Practice & Experience
ICPP '06 Proceedings of the 2006 International Conference on Parallel Processing
Enhancing self-scheduling algorithms via synchronization and weighting
Journal of Parallel and Distributed Computing
Scalable loop self-scheduling schemes for heterogeneous clusters
International Journal of Computational Science and Engineering
Future Generation Computer Systems
Optimal synchronization frequency for dynamic pipelined computations on heterogeneous systems
CLUSTER '07 Proceedings of the 2007 IEEE International Conference on Cluster Computing
Loosely-coupled loop scheduling in computational grids
IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
Dynamic multi phase scheduling for heterogeneous cluste
IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
Concurrency and Computation: Practice & Experience
Partitioning and scheduling loops on NOWs
Computer Communications
Hi-index | 0.00 |
Loops are the richest source of parallelism in scientific applications. A large number of loop scheduling schemes have therefore been devised for loops with and without data dependencies (modeled as dependence distance vectors) on heterogeneous clusters. The loops with data dependencies require synchronization via cross-node communication. Synchronization requires fine-tuning to overcome the communication overhead and to yield the best possible overall performance. In this paper, a theoretical model is presented to determine the granularity of synchronization that minimizes the parallel execution time of loops with data dependencies when these are parallelized on heterogeneous systems using dynamic self-scheduling algorithms. New formulas are proposed for estimating the total number of scheduling steps when a threshold for the minimum work assigned to a processor is assumed. The proposed model uses these formulas to determine the synchronization granularity that minimizes the estimated parallel execution time. The accuracy of the proposed model is verified and validated via extensive experiments on a heterogeneous computing system. The results show that the theoretically optimal synchronization granularity, as determined by the proposed model, is very close to the experimentally observed optimal synchronization granularity, with no deviation in the best case, and within 38.4% in the worst case. Copyright © 2012 John Wiley & Sons, Ltd.