Allocating Independent Subtasks on Parallel Processors

Authors:
Clyde P. Kruskal;Alan Weiss
Affiliations:
Univ. of Illinois at Urbana-Champaign, Urbana;AT&T Bell Laboratories, Murray Hill, NJ
Venue:
IEEE Transactions on Software Engineering
Year:
1985

Citing 0
Cited 110

Guided self-scheduling: A practical scheduling scheme for parallel supercomputers

IEEE Transactions on Computers
The limited performance benefits of migrating active processes for load sharing

SIGMETRICS '88 Proceedings of the 1988 ACM SIGMETRICS conference on Measurement and modeling of computer systems
Acyclic fork-join queuing networks

Journal of the ACM (JACM)
Determining average program execution times and their variance

PLDI '89 Proceedings of the ACM SIGPLAN 1989 Conference on Programming language design and implementation
An Analysis of Scatter Decomposition

IEEE Transactions on Computers
Processor scheduling in shared memory multiprocessors

SIGMETRICS '90 Proceedings of the 1990 ACM SIGMETRICS conference on Measurement and modeling of computer systems
Dynamic Processor Self-Scheduling for General Parallel Nested Loops

IEEE Transactions on Computers
Asynchronous Disk Interleaving: Approximating Access Delays

IEEE Transactions on Computers
Factoring: a practical and robust method for scheduling parallel loops

Proceedings of the 1991 ACM/IEEE conference on Supercomputing
Factoring: a method for scheduling parallel loops

Communications of the ACM
Low-overhead scheduling of nested parallelism

IBM Journal of Research and Development
Automatic partitioning of a program dependence graph into parallel tasks

IBM Journal of Research and Development
A dynamic scheduling method for irregular parallel programs

PLDI '92 Proceedings of the ACM SIGPLAN 1992 conference on Programming language design and implementation
Scalability analysis of partitioning strategies for finite element graphs: a summary of results

Proceedings of the 1992 ACM/IEEE conference on Supercomputing
Using processor affinity in loop scheduling on shared-memory multiprocessors

Proceedings of the 1992 ACM/IEEE conference on Supercomputing
Orchestrating interactions among parallel computations

PLDI '93 Proceedings of the ACM SIGPLAN 1993 conference on Programming language design and implementation
Managing pages in shared virtual memory systems: getting the compiler into the game

ICS '93 Proceedings of the 7th international conference on Supercomputing
The influence of random delays on parallel execution times

SIGMETRICS '93 Proceedings of the 1993 ACM SIGMETRICS conference on Measurement and modeling of computer systems
Asynchronous analysis of parallel dynamic programming

SIGMETRICS '93 Proceedings of the 1993 ACM SIGMETRICS conference on Measurement and modeling of computer systems
Combining static and dynamic scheduling on distributed-memory multiprocessors

ICS '94 Proceedings of the 8th international conference on Supercomputing
Cost/performance of a parallel computer simulator

PADS '94 Proceedings of the eighth workshop on Parallel and distributed simulation
A Hierarchical Task Queue Organization for Shared-Memory Multiprocessor Systems

IEEE Transactions on Parallel and Distributed Systems
Asynchronous Analysis of Parallel Dynamic Programming Algorithms

IEEE Transactions on Parallel and Distributed Systems
Symbolic analysis for parallelizing compilers

ACM Transactions on Programming Languages and Systems (TOPLAS)
Load-sharing in heterogeneous systems via weighted factoring

Proceedings of the eighth annual ACM symposium on Parallel algorithms and architectures
Static Assignment of Stochastic Tasks Using Majorization

IEEE Transactions on Computers
Impact of Memory Contention on Dynamic Scheduling on NUMA Multiprocessors

IEEE Transactions on Parallel and Distributed Systems
Modeling cost/performance of a parallel computer simulator

ACM Transactions on Modeling and Computer Simulation (TOMACS)
Dynamic scheduling with incomplete information

Proceedings of the tenth annual ACM symposium on Parallel algorithms and architectures
Performance prediction based loop scheduling for heterogeneous computing environment

SAC '97 Proceedings of the 1997 ACM symposium on Applied computing
Performance analysis for parallel solutions to generic search problems

SAC '97 Proceedings of the 1997 ACM symposium on Applied computing
Analyzing the expected execution times of parallel programs

SAC '97 Proceedings of the 1997 ACM symposium on Applied computing
Static performance prediction of data-dependent programs

Proceedings of the 2nd international workshop on Software and performance
Performance Metrics for Embedded Parallel Pipelines

IEEE Transactions on Parallel and Distributed Systems
Dynamic adaptation to available resources for parallel computing in an autonomous network of workstations

PPoPP '01 Proceedings of the eighth ACM SIGPLAN symposium on Principles and practices of parallel programming
Language and Compiler Support for Adaptive Distributed Applications

OM '01 Proceedings of the 2001 ACM SIGPLAN workshop on Optimization of middleware and distributed systems
A Hybrid Solution of Fork/Join Synchronization in Parallel Queues

IEEE Transactions on Parallel and Distributed Systems
Affinity scheduling of unbalanced workloads

Proceedings of the 1994 ACM/IEEE conference on Supercomputing
Beyond Execution Time: Expanding the Use of Performance Models

IEEE Parallel & Distributed Technology: Systems & Technology
Parallelizing a GIS on a Shared Address Space Architecture

Computer
Stochastic Bounds for Parallel Program Execution Times with Processor Constraints

IEEE Transactions on Computers
Effectiveness of Parallel Joins

IEEE Transactions on Knowledge and Data Engineering
Declustering and Load-Balancing Methods for Parallelizing Geographic Information Systems

IEEE Transactions on Knowledge and Data Engineering
The Effect of Scheduling Discipline on Spin Overhead in Shared Memory Parallel Systems

IEEE Transactions on Parallel and Distributed Systems
Synchronization and Communication Costs of Loop Partitioning on Shared-Memory Multiprocessor Systems

IEEE Transactions on Parallel and Distributed Systems
Loop Coalescing and Scheduling for Barrier MIMD Architectures

IEEE Transactions on Parallel and Distributed Systems
Performance Analysis and Scheduling of Stochastic Fork-Join Jobs in a Multicomputer System

IEEE Transactions on Parallel and Distributed Systems
Using Processor Affinity in Loop Scheduling on Shared-Memory Multiprocessors

IEEE Transactions on Parallel and Distributed Systems
Computing Performance Bounds of Fork-Join Parallel Programs Under a Multiprocessing Environment

IEEE Transactions on Parallel and Distributed Systems
Dynamic Scheduling Parallel Loops with Variable Iterate Execution Times

IPDPS '02 Proceedings of the 16th International Parallel and Distributed Processing Symposium
Load Balancing Highly Irregular Computations with the Adaptive Factoring

IPDPS '02 Proceedings of the 16th International Parallel and Distributed Processing Symposium
Performance of Scheduling Scientific Applications with Adaptive Weighted Factoring

IPDPS '01 Proceedings of the 15th International Parallel & Distributed Processing Symposium
A Theoretical Application of Feedback Guided Dynamic Loop Scheduling

IWCC '01 Proceedings of the NATO Advanced Research Workshop on Advanced Environments, Tools, and Applications for Cluster Computing-Revised Papers
Performance Prediction of Data-Dependent Task Parallel Programs

Euro-Par '01 Proceedings of the 7th International Euro-Par Conference Manchester on Parallel Processing
A Semi-dynamic Multiprocessor Scheduling Algorithm with an Asymptotically Optimal Competitive Ratio

Euro-Par '02 Proceedings of the 8th International Euro-Par Conference on Parallel Processing
Symbolic Performance Prediction of Data-Dependent Parallel Programs

TOOLS '02 Proceedings of the 12th International Conference on Computer Performance Evaluation, Modelling Techniques and Tools
Compiler and Run-Time Support for Adaptive Load Balancing in Software Distributed Shared Memory Systems

LCR '98 Selected Papers from the 4th International Workshop on Languages, Compilers, and Run-Time Systems for Scalable Computers
Scheduling at Twilight the Easy Way

STACS '02 Proceedings of the 19th Annual Symposium on Theoretical Aspects of Computer Science
Adaptive Computing on the Grid Using AppLeS

IEEE Transactions on Parallel and Distributed Systems
Automatic parallelization for symmetric shared-memory multiprocessors

CASCON '96 Proceedings of the 1996 conference of the Centre for Advanced Studies on Collaborative research
Customized dynamic load balancing for a network of workstations

HPDC '96 Proceedings of the 5th IEEE International Symposium on High Performance Distributed Computing
Parallel program performance prediction using deterministic task graph analysis

ACM Transactions on Computer Systems (TOCS)
Message-passing parallel adaptive quantum trajectory method

High performance scientific and engineering computing
Simulation of Vector Nonlinear Time Series Models on Clusters

IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Workshop 13 - Volume 14
A novel approach for partitioning iteration spaces with variable densities

Proceedings of the tenth ACM SIGPLAN symposium on Principles and practice of parallel programming
Shared memory multiprocessor support for functional array processing in SAC

Journal of Functional Programming
Low-Cost Static Performance Prediction of Parallel Stochastic Task Compositions

IEEE Transactions on Parallel and Distributed Systems
Design and implementation of a novel dynamic load balancing library for cluster computing

Parallel Computing - Heterogeneous computing
A Load Balancing Tool for Distributed Parallel Loops

Cluster Computing
Advances, Applications and Performance of the Global Arrays Shared Memory Programming Toolkit

International Journal of High Performance Computing Applications
Unstructured peer-to-peer networks for sharing processor cycles

Parallel Computing - Parallel matrix algorithms and applications (PMAA'04)
A general approach for partitioning N-dimensional parallel nested loops with conditionals

Proceedings of the eighteenth annual ACM symposium on Parallelism in algorithms and architectures
Modeling master/worker applications for automatic performance tuning

Parallel Computing - Algorithmic skeletons
Tight analysis of the performance potential of thread speculation using spec CPU 2006

Proceedings of the 12th ACM SIGPLAN symposium on Principles and practice of parallel programming
Using the GA and TAO toolkits for solving large-scale optimization problems on parallel computers

ACM Transactions on Mathematical Software (TOMS)
CRAUL: Compiler and run-time integration for adaptation under load[1]This work was supported in part by NSF grants CDA-9401142, CCR-9702466, and CCR-9705594; and an external research grant from Compaq.

Scientific Programming
New Scheduling Strategies for Randomized Incremental Algorithms in the Context of Speculative Parallelization

IEEE Transactions on Computers
A performance-based parallel loop scheduling on grid environments

The Journal of Supercomputing
Enhancing self-scheduling algorithms via synchronization and weighting

Journal of Parallel and Distributed Computing
Dynamic load balancing with adaptive factoring methods in scientific applications

The Journal of Supercomputing
Performance evaluation of a dynamic load-balancing library for cluster computing

International Journal of Computational Science and Engineering
A practical application of FGDLS to birds flock trajectory

ICCOMP'05 Proceedings of the 9th WSEAS International Conference on Computers
Performance modeling and analysis of correlated parallel computations

Parallel Computing
Derivation of self-scheduling algorithms for heterogeneous distributed computer systems: Application to internet-based grids of computers

Future Generation Computer Systems
Chunking parallel loops in the presence of synchronization

Proceedings of the 23rd international conference on Supercomputing
Task distribution using factoring load balancing in Master--Worker applications

Information Processing Letters
A directive-based MPI code generator for Linux PC clusters

The Journal of Supercomputing
An adaptive multi-policy grid service for biological sequence comparison

Journal of Parallel and Distributed Computing
Performance-based workload distribution on grid environments

GPC'07 Proceedings of the 2nd international conference on Advances in grid and pervasive computing
Enhanced loop coalescing: a compiler technique for transforming non-uniform iteration spaces

ISHPC'05/ALPS'06 Proceedings of the 6th international symposium on high-performance computing and 1st international conference on Advanced low power systems
Adaptive statistical scheduling of divisible workloads in heterogeneous systems

Journal of Scheduling
Integration of Heterogeneous and Non-dedicated Environments for R

CCGRID '10 Proceedings of the 2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing
Studying the impact of synchronization frequency on scheduling tasks with dependencies in heterogeneous systems

Performance Evaluation
Simulation of a hybrid model for image denoising

IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
Dynamic multi phase scheduling for heterogeneous cluste

IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
A parameter study of a hybrid Laplacian mean-curvature flow denoising model

The Journal of Supercomputing
Distributed dynamic load balancing for pipelined computations on heterogeneous systems

Parallel Computing
Load and performance balancing scheme for heterogeneous parallel processing

CIS'04 Proceedings of the First international conference on Computational and Information Science
An efficient approach for self-scheduling parallel loops on multiprogrammed parallel computers

LCPC'05 Proceedings of the 18th international conference on Languages and Compilers for Parallel Computing
A new carried-dependence self-scheduling algorithm

ICCSA'05 Proceedings of the 2005 international conference on Computational Science and its Applications - Volume Part I
Scheduling divisible workloads using the adaptive time factoring algorithm

ICA3PP'05 Proceedings of the 6th international conference on Algorithms and Architectures for Parallel Processing
A geometric approach for partitioning n-dimensional non-rectangular iteration spaces

LCPC'04 Proceedings of the 17th international conference on Languages and Compilers for High Performance Computing
Online task scheduling on heterogeneous clusters: an experimental study

PARA'04 Proceedings of the 7th international conference on Applied Parallel Computing: state of the Art in Scientific Computing
A-FAST: autonomous flow approach to scheduling tasks

HiPC'04 Proceedings of the 11th international conference on High Performance Computing
A self-adaptive computing framework for parallel maximum likelihood evaluation

The Journal of Supercomputing
Tuning of algorithms for independent task placement in the context of demand-driven parallel ray tracing

EG PGV'04 Proceedings of the 5th Eurographics conference on Parallel Graphics and Visualization
A flexible general-purpose parallelizing architecture for nested loops in reconfigurable platforms

PATMOS'07 Proceedings of the 17th international conference on Integrated Circuit and System Design: power and timing modeling, optimization and simulation
Towards the optimal synchronization granularity for dynamic scheduling of pipelined computations on heterogeneous computing systems

Concurrency and Computation: Practice & Experience
A Transformation Framework for Optimizing Task-Parallel Programs

ACM Transactions on Programming Languages and Systems (TOPLAS)
Load balancing in a changing world: dealing with heterogeneity and performance variability

Proceedings of the ACM International Conference on Computing Frontiers

Quantified Score

Hi-index	0.04

Visualization

Abstract

When using MIMD (multiple instruction, multiple data) parallel computers, one is often confronted with solving a task composed of many independent subtasks where it is necessary to synchronize the processors after all the subtasks have been completed. This paper studies how the subtasks should be allocated to the processors in order to minimize the expected time it takes to finish all the subtasks (sometimes called the makespan). We assume that the running times of the subtasks are independent, identically distributed, increasing failure rate random variables, and that assigning one or more subtasks to a processor entails some overhead, or communication time, that is independent of the number of subtasks allocated. Our analyses, which use ideas from renewal theory, reliability theory, order statistics, and the theory of large deviations, are valid for a wide class of distributions. We show that allocating an equal number of subtasks to each processor all at once has good efficiency. This appears as a consequence of a rather general theorem which shows how some consequences of the central limit theorem hold even when we cannot prove that the central limit theorem applies.