Allocating Independent Subtasks on Parallel Processors
IEEE Transactions on Software Engineering
Guided self-scheduling: A practical scheduling scheme for parallel supercomputers
IEEE Transactions on Computers
Factoring: a method for scheduling parallel loops
Communications of the ACM
Balancing processor loads and exploiting data locality in N-body simulations
Supercomputing '95 Proceedings of the 1995 ACM/IEEE conference on Supercomputing
Optimal orthogonal tiling of 2-D iterations
Journal of Parallel and Distributed Computing
Accurately Selecting Block Size at Runtime in Pipelined Parallel Programs
International Journal of Parallel Programming
Loop tiling for parallelism
Parallel Processing: From Applications to Systems
Parallel Processing: From Applications to Systems
Time-minimal tiling when rise is larger than zero
Parallel Computing
Trapezoid Self-Scheduling: A Practical Scheduling Scheme for Parallel Compilers
IEEE Transactions on Parallel and Distributed Systems
Using Processor Affinity in Loop Scheduling on Shared-Memory Multiprocessors
IEEE Transactions on Parallel and Distributed Systems
Load Balancing Highly Irregular Computations with the Adaptive Factoring
IPDPS '02 Proceedings of the 16th International Parallel and Distributed Processing Symposium
Optimal Grain Size Computation for Pipelined Algorithms
Euro-Par '96 Proceedings of the Second International Euro-Par Conference on Parallel Processing - Volume I
A Class of Loop Self-Scheduling for Heterogeneous Clusters
CLUSTER '01 Proceedings of the 3rd IEEE International Conference on Cluster Computing
Sparse Tiling for Stationary Iterative Methods
International Journal of High Performance Computing Applications
Distributed loop-scheduling schemes for heterogeneous computer systems: Research Articles
Concurrency and Computation: Practice & Experience
ICPP '06 Proceedings of the 2006 International Conference on Parallel Processing
PACT '07 Proceedings of the 16th International Conference on Parallel Architecture and Compilation Techniques
Enhancing self-scheduling algorithms via synchronization and weighting
Journal of Parallel and Distributed Computing
Dynamic multi phase scheduling for heterogeneous cluste
IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
Partitioning and scheduling loops on NOWs
Computer Communications
Concurrency and Computation: Practice & Experience
Hi-index | 0.00 |
In this work, we develop and evaluate a theoretical model, which we then use to study the impact of the synchronization frequency on the performance of dynamic self-scheduling algorithms. These algorithms are used to parallelize loops with data dependencies on heterogeneous systems. The proposed model uses a formula to estimate the parallel time as a function of the synchronization frequency. Inter-node communication has been proven to be the dominant factor for the performance degradation of applications containing loops with data dependencies. The synchronization mechanism therefore requires careful fine-tuning in order to give the best possible performance. The proposed model determines the optimal synchronization frequency that results in the minimum parallel time. We use this model to study the impact of the synchronization frequency on the parallel execution of a computational kernel from image processing. For this kernel, the synchronization frequency giving the minimum parallel time predicted by our theoretical model was very close to the synchronization frequency giving the least parallel time in practice. We validate our model by extensive comparisons of the theoretically predicted parallel time and synchronization frequency against those obtained from practical experiments. The comparisons show that the proposed model is highly accurate, its predictions for the optimal synchronization frequency being within 0.0250% of the experimentally optimal synchronization frequency in the best case, and within 0.1750% of the experimentally optimal synchronization frequency in the worst case. Finally, the comparisons show that the proposed model improves on a previously existing model in heterogeneous systems, whereas it gives similar results in homogeneous systems.