Allocating Independent Subtasks on Parallel Processors
IEEE Transactions on Software Engineering
Interprocedural constant propagation
SIGPLAN '86 Proceedings of the 1986 SIGPLAN symposium on Compiler construction
Parallel programming in a virtual object space
OOPSLA '87 Conference proceedings on Object-oriented programming systems, languages and applications
Guided self-scheduling: A practical scheduling scheme for parallel supercomputers
IEEE Transactions on Computers
An overview for the PTRAN analysis system for multiprocessing
Journal of Parallel and Distributed Computing - Special Issue on Languages, Compilers and environments for Parallel Programming
Partitioning programs for parallel execution
ICS '88 Proceedings of the 2nd international conference on Supercomputing
Determining average program execution times and their variance
PLDI '89 Proceedings of the ACM SIGPLAN 1989 Conference on Programming language design and implementation
An efficient method of computing static single assignment form
POPL '89 Proceedings of the 16th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Proceedings of the 1989 ACM/IEEE conference on Supercomputing
Parallel programming with coordination structures
POPL '91 Proceedings of the 18th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Delirium: an embedding coordination language
Proceedings of the 1990 ACM/IEEE conference on Supercomputing
A simple load balancing scheme for task allocation in parallel machines
SPAA '91 Proceedings of the third annual ACM symposium on Parallel algorithms and architectures
A data locality optimizing algorithm
PLDI '91 Proceedings of the ACM SIGPLAN 1991 conference on Programming language design and implementation
Partitioning parallel programs for macro-dataflow
LFP '86 Proceedings of the 1986 ACM conference on LISP and functional programming
SIGPLAN '84 Proceedings of the 1984 SIGPLAN symposium on Compiler construction
A Loop Transformation Theory and an Algorithm to Maximize Parallelism
IEEE Transactions on Parallel and Distributed Systems
Optimizing supercompilers for supercomputers
Optimizing supercompilers for supercomputers
Orchestrating interactions among parallel computations
PLDI '93 Proceedings of the ACM SIGPLAN 1993 conference on Programming language design and implementation
Combining static and dynamic scheduling on distributed-memory multiprocessors
ICS '94 Proceedings of the 8th international conference on Supercomputing
Compiler transformations for high-performance computing
ACM Computing Surveys (CSUR)
Controlling application grain size on a network of workstations
Supercomputing '95 Proceedings of the 1995 ACM/IEEE conference on Supercomputing
Load-sharing in heterogeneous systems via weighted factoring
Proceedings of the eighth annual ACM symposium on Parallel algorithms and architectures
Adaptively Scheduling Parallel Loops in Distributed Shared-Memory Systems
IEEE Transactions on Parallel and Distributed Systems
Dynamic scheduling with incomplete information
Proceedings of the tenth annual ACM symposium on Parallel algorithms and architectures
Dynamic Task Scheduling Using Online Optimization
IEEE Transactions on Parallel and Distributed Systems
Using Processor Affinity in Loop Scheduling on Shared-Memory Multiprocessors
IEEE Transactions on Parallel and Distributed Systems
Feedback Guided Scheduling of Nested Loops
PARA '00 Proceedings of the 5th International Workshop on Applied Parallel Computing, New Paradigms for HPC in Industry and Academia
Scheduling at Twilight the Easy Way
STACS '02 Proceedings of the 19th Annual Symposium on Theoretical Aspects of Computer Science
Feedback guided dynamic loop scheduling: convergence of the continuous case
The Journal of Supercomputing - Special issue: Parallel and distributed processing and applications
IEEE Transactions on Computers
Memory bank aware dynamic loop scheduling
Proceedings of the conference on Design, automation and test in Europe
Provably efficient two-level adaptive scheduling
JSSPP'06 Proceedings of the 12th international conference on Job scheduling strategies for parallel processing
Is the schedule clause really necessary in OpenMP?
WOMPAT'03 Proceedings of the OpenMP applications and tools 2003 international conference on OpenMP shared memory parallel programming
Enhanced loop coalescing: a compiler technique for transforming non-uniform iteration spaces
ISHPC'05/ALPS'06 Proceedings of the 6th international symposium on high-performance computing and 1st international conference on Advanced low power systems
A new carried-dependence self-scheduling algorithm
ICCSA'05 Proceedings of the 2005 international conference on Computational Science and its Applications - Volume Part I
Convergence of the discrete FGDLS algorithm
HPCC'05 Proceedings of the First international conference on High Performance Computing and Communications
Partitioning and scheduling loops on NOWs
Computer Communications
Hi-index | 0.00 |
This paper develops a methodology for compiling and executing irregular parallel programs. Such programs implement parallel operations whose size and work distribution depend on input data. We show a fundamental relationship between three quantities that characterize an irregular parallel computation: the total available parallelism, the optimal grain size, and the statistical variance of execution times for individual tasks. This relationship yields a dynamic scheduling algorithm that substantially reduces the overhead of executing irregular parallel operations.We incorporated this algorithm into an extended Fortran compiler. The compiler accepts as input a subset of Fortran D which includes blocked and cyclic decompositions and perfect alignment; it outputs Fortran 77 augmented with calls to library routines written in C. For irregular parallel operations, the compiled code gathers information about available parallelism and task execution time variance and uses this information to schedule the operation. On distributed memory architectures, the compiler encodes information about data access patterns for the runtime scheduling system so that it can preserve communication locality.We evaluated these compilation techniques using a set of application programs including climate modeling, circuit simulation, and x-ray tomography, that contain irregular parallel operations. The results demonstrate that, for these applications, the dynamic techniques described here achieve near-optimal efficiency on large numbers of processors. In addition, they perform significantly better, on these problems, than any previously proposed static or dynamic scheduling algorithm.