Allocating Independent Subtasks on Parallel Processors
IEEE Transactions on Software Engineering
Advanced compiler optimizations for supercomputers
Communications of the ACM - Special issue on parallelism
Guided self-scheduling: A practical scheduling scheme for parallel supercomputers
IEEE Transactions on Computers
Utilizing Multidimensional Loop Parallelism on Large Scale Parallel Processor Systems
IEEE Transactions on Computers
Selected papers of the second workshop on Languages and compilers for parallel computing
Dynamic Processor Self-Scheduling for General Parallel Nested Loops
IEEE Transactions on Computers
A dynamic scheduling method for irregular parallel programs
PLDI '92 Proceedings of the ACM SIGPLAN 1992 conference on Programming language design and implementation
Enabling technologies for petaflops computing
Enabling technologies for petaflops computing
Communications of the ACM
Parallel transient dynamics simulations
Journal of Parallel and Distributed Computing - Special issue on irregular problems in supercomputing applications
An efficient message-passing scheduler based on guided self scheduling
ICS '89 Proceedings of the 3rd international conference on Supercomputing
Supercomputing '96 Proceedings of the 1996 ACM/IEEE conference on Supercomputing
Supercomputing '96 Proceedings of the 1996 ACM/IEEE conference on Supercomputing
The quest for petascale computing
Computing in Science and Engineering
Demonstrating the scalability of a molecular dynamics application on a Petaflop computer
ICS '01 Proceedings of the 15th international conference on Supercomputing
Computers and Intractability: A Guide to the Theory of NP-Completeness
Computers and Intractability: A Guide to the Theory of NP-Completeness
Loop Coalescing and Scheduling for Barrier MIMD Architectures
IEEE Transactions on Parallel and Distributed Systems
Gilgamesh: a multithreaded processor-in-memory architecture for petaflops computing
Proceedings of the 2002 ACM/IEEE conference on Supercomputing
Hybrid technology multithreaded architecture
FRONTIERS '96 Proceedings of the 6th Symposium on the Frontiers of Massively Parallel Computation
Pursuing a Petaflop: Point Designs for 100 TF Computers Using PIM Technologies
FRONTIERS '96 Proceedings of the 6th Symposium on the Frontiers of Massively Parallel Computation
Feedback Guided Dynamic Loop Scheduling; A Theoretical Approach
ICPPW '01 Proceedings of the 2001 International Conference on Parallel Processing Workshops
Protein Explorer: A Petaflops Special-Purpose Computer System for Molecular Dynamics Simulations
Proceedings of the 2003 ACM/IEEE conference on Supercomputing
Hi-index | 0.00 |
Parallel nested loops are the largest potential source of parallelism in numerical and scientific applications. Therefore, executing parallel loops with low run-time overhead is very important for achieving high performance on parallel computers. Guided self-scheduling (GSS) has long been used for dynamic scheduling of parallel loops on shared memory parallel machines and for efficient utilization of dynamically allocated processors. In order to minimize the synchronization (or scheduling) overhead in GSS, loop coalescing has been proposed as a restructuring technique to transform nested loops into a single loop. In other words, coalescing "flattens" the iteration space in lexicographic order of the indices of the original loop. Although coalescing helps reduce the run-time scheduling overhead, it does not necessarily minimize the makespan, i.e., the maximum finishing time, especially in situations where the execution time (workload) of iterations is not uniform as is often the case in practice, e.g., in control intensive applications. This can be attributed to the fact that the makespan is directly dependent on the workload distribution across the flattened iteration space. The latter in itself depends on the order of coalescing of the loop indices. We show that coalescing (as proposed) can potentially result in large makespans. In this paper, we present a loop permutation-based approach to loop coalescing, referred to as enhanced loop coalescing, to achieve near-optimal schedules. Several examples are presented and the general technique is discussed in detail.