Enhanced loop coalescing: a compiler technique for transforming non-uniform iteration spaces

Authors:
Arun Kejariwal;Alexandru Nicolau;Constantine D. Polychronopoulos
Affiliations:
Center for Embedded Computer Systems, University of California at Irvine, Irvine, CA;Center for Embedded Computer Systems, University of California at Irvine, Irvine, CA;Center for Supercomputing Research and Development, University of Illinois at Urbana-Champaign, Urbana, IL
Venue:
ISHPC'05/ALPS'06 Proceedings of the 6th international symposium on high-performance computing and 1st international conference on Advanced low power systems
Year:
2005

Citing 22
Cited 0

Allocating Independent Subtasks on Parallel Processors

IEEE Transactions on Software Engineering
Advanced compiler optimizations for supercomputers

Communications of the ACM - Special issue on parallelism
Guided self-scheduling: A practical scheduling scheme for parallel supercomputers

IEEE Transactions on Computers
Utilizing Multidimensional Loop Parallelism on Large Scale Parallel Processor Systems

IEEE Transactions on Computers
A theory of loop permutations

Selected papers of the second workshop on Languages and compilers for parallel computing
Dynamic Processor Self-Scheduling for General Parallel Nested Loops

IEEE Transactions on Computers
A dynamic scheduling method for irregular parallel programs

PLDI '92 Proceedings of the ACM SIGPLAN 1992 conference on Programming language design and implementation
Enabling technologies for petaflops computing

Enabling technologies for petaflops computing
Onward to petaflops computing

Communications of the ACM
Parallel transient dynamics simulations

Journal of Parallel and Distributed Computing - Special issue on irregular problems in supercomputing applications
An efficient message-passing scheduler based on guided self scheduling

ICS '89 Proceedings of the 3rd international conference on Supercomputing
Performance analysis and optimization on the UCLA parallel atmospheric general circulation model code

Supercomputing '96 Proceedings of the 1996 ACM/IEEE conference on Supercomputing
Transient dynamics simulations: parallel algorithms for contact detection and smoothed particle hydrodynamics

Supercomputing '96 Proceedings of the 1996 ACM/IEEE conference on Supercomputing
The quest for petascale computing

Computing in Science and Engineering
Demonstrating the scalability of a molecular dynamics application on a Petaflop computer

ICS '01 Proceedings of the 15th international conference on Supercomputing
Computers and Intractability: A Guide to the Theory of NP-Completeness

Computers and Intractability: A Guide to the Theory of NP-Completeness
Loop Coalescing and Scheduling for Barrier MIMD Architectures

IEEE Transactions on Parallel and Distributed Systems
Gilgamesh: a multithreaded processor-in-memory architecture for petaflops computing

Proceedings of the 2002 ACM/IEEE conference on Supercomputing
Hybrid technology multithreaded architecture

FRONTIERS '96 Proceedings of the 6th Symposium on the Frontiers of Massively Parallel Computation
Pursuing a Petaflop: Point Designs for 100 TF Computers Using PIM Technologies

FRONTIERS '96 Proceedings of the 6th Symposium on the Frontiers of Massively Parallel Computation
Feedback Guided Dynamic Loop Scheduling; A Theoretical Approach

ICPPW '01 Proceedings of the 2001 International Conference on Parallel Processing Workshops
Protein Explorer: A Petaflops Special-Purpose Computer System for Molecular Dynamics Simulations

Proceedings of the 2003 ACM/IEEE conference on Supercomputing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Parallel nested loops are the largest potential source of parallelism in numerical and scientific applications. Therefore, executing parallel loops with low run-time overhead is very important for achieving high performance on parallel computers. Guided self-scheduling (GSS) has long been used for dynamic scheduling of parallel loops on shared memory parallel machines and for efficient utilization of dynamically allocated processors. In order to minimize the synchronization (or scheduling) overhead in GSS, loop coalescing has been proposed as a restructuring technique to transform nested loops into a single loop. In other words, coalescing "flattens" the iteration space in lexicographic order of the indices of the original loop. Although coalescing helps reduce the run-time scheduling overhead, it does not necessarily minimize the makespan, i.e., the maximum finishing time, especially in situations where the execution time (workload) of iterations is not uniform as is often the case in practice, e.g., in control intensive applications. This can be attributed to the fact that the makespan is directly dependent on the workload distribution across the flattened iteration space. The latter in itself depends on the order of coalescing of the loop indices. We show that coalescing (as proposed) can potentially result in large makespans. In this paper, we present a loop permutation-based approach to loop coalescing, referred to as enhanced loop coalescing, to achieve near-optimal schedules. Several examples are presented and the general technique is discussed in detail.