Scalable work stealing

Authors:
James Dinan;D. Brian Larkins;P. Sadayappan;Sriram Krishnamoorthy;Jarek Nieplocha
Affiliations:
The Ohio State University, Columbus, OH;The Ohio State University, Columbus, OH;The Ohio State University, Columbus, OH;Pacific Northwest National Laboratory, Richland, WA;Pacific Northwest National Laboratory, Richland, WA
Venue:
Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis
Year:
2009

Citing 28
Cited 36

CHARM++: a portable concurrent object oriented system based on C++

OOPSLA '93 Proceedings of the eighth annual conference on Object-oriented programming systems, languages, and applications
Scalable load balancing techniques for parallel computers

Journal of Parallel and Distributed Computing
A provable time and space efficient implementation of NESL

Proceedings of the first ACM SIGPLAN international conference on Functional programming
Thread scheduling for multiprogrammed multiprocessors

Proceedings of the tenth annual ACM symposium on Parallel algorithms and architectures
Analyses of load stealing models based on differential equations

Proceedings of the tenth annual ACM symposium on Parallel algorithms and architectures
The implementation of the Cilk-5 multithreaded language

PLDI '98 Proceedings of the ACM SIGPLAN 1998 conference on Programming language design and implementation
Static scheduling algorithms for allocating directed task graphs to multiprocessors

ACM Computing Surveys (CSUR)
A new solution of Dijkstra's concurrent programming problem

Communications of the ACM
Efficient load balancing for wide-area divide-and-conquer applications

PPoPP '01 Proceedings of the eighth ACM SIGPLAN symposium on Principles and practices of parallel programming
Global arrays: a portable "shared-memory" programming model for distributed memory computers

Proceedings of the 1994 ACM/IEEE conference on Supercomputing
ARMCI: A Portable Remote Memory Copy Libray for Ditributed Array Libraries and Compiler Run-Time Systems

Proceedings of the 11 IPPS/SPDP'99 Workshops Held in Conjunction with the 13th International Parallel Processing Symposium and 10th Symposium on Parallel and Distributed Processing
The Natural Work-Stealing Algorithm is Stable

FOCS '01 Proceedings of the 42nd IEEE symposium on Foundations of Computer Science
MPI: A Message-Passing Interface Standard

MPI: A Message-Passing Interface Standard
Parallel Programming and Parallel Abstractions in Fortress

Proceedings of the 14th International Conference on Parallel Architectures and Compilation Techniques
X10: an object-oriented approach to non-uniform cluster computing

OOPSLA '05 Proceedings of the 20th annual ACM SIGPLAN conference on Object-oriented programming, systems, languages, and applications
A dynamic-sized nonblocking work stealing deque

Distributed Computing - Special issue: DISC 04
Hypergraph partitioning for automatic memory hierarchy management

Proceedings of the 2006 ACM/IEEE conference on Supercomputing
Adaptive and reliable parallel computing on networks of workstations

ATEC '97 Proceedings of the annual conference on USENIX Annual Technical Conference
Parallel Programmability and the Chapel Language

International Journal of High Performance Computing Applications
Achieving Distributed Termination without Freezing

IEEE Transactions on Software Engineering
Parallel multilevel algorithms for hypergraph partitioning

Journal of Parallel and Distributed Computing
Scheduling multithreaded computations by work stealing

SFCS '94 Proceedings of the 35th Annual Symposium on Foundations of Computer Science
Solving Large, Irregular Graph Problems Using Adaptive Work-Stealing

ICPP '08 Proceedings of the 2008 37th International Conference on Parallel Processing
Scioto: A Framework for Global-View Task Parallelism

ICPP '08 Proceedings of the 2008 37th International Conference on Parallel Processing
Scalable Dynamic Load Balancing Using UPC

ICPP '08 Proceedings of the 2008 37th International Conference on Parallel Processing
Idempotent work stealing

Proceedings of the 14th ACM SIGPLAN symposium on Principles and practice of parallel programming
New challenges in dynamic load balancing

Applied Numerical Mathematics - Adaptive methods for partial differential equations and large-scale computation
UTS: an unbalanced tree search benchmark

LCPC'06 Proceedings of the 19th international conference on Languages and compilers for parallel computing

Compiler and runtime support for enabling generalized reduction computations on heterogeneous parallel configurations

Proceedings of the 24th ACM International Conference on Supercomputing
Selective Recovery from Failures in a Task Parallel Programming Model

CCGRID '10 Proceedings of the 2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing
Lifeline-based global load balancing

Proceedings of the 16th ACM symposium on Principles and practice of parallel programming
Parallelization libraries: Characterizing and reducing overheads

ACM Transactions on Architecture and Code Optimization (TACO)
Unbalanced tree search on a manycore system using the GPI programming model

Computer Science - Research and Development
Work stealing for multi-core HPC clusters

Euro-Par'11 Proceedings of the 17th international conference on Parallel processing - Volume Part I
Join-Idle-Queue: A novel load balancing algorithm for dynamically scalable web services

Performance Evaluation
Accelerating the requirement space exploration through coarse-grained parallel execution

NPC'11 Proceedings of the 8th IFIP international conference on Network and parallel computing
Periodic hierarchical load balancing for large supercomputers

International Journal of High Performance Computing Applications
A step-by-step extending parallelism approach for enumeration of combinatorial objects

ICA3PP'10 Proceedings of the 10th international conference on Algorithms and Architectures for Parallel Processing - Volume Part I
BWS: balanced work stealing for time-sharing multicores

Proceedings of the 7th ACM european conference on Computer Systems
Work stealing strategies for parallel stream processing in soft real-time systems

ARCS'12 Proceedings of the 25th international conference on Architecture of Computing Systems
Compiler and runtime support for enabling reduction computations on heterogeneous systems

Concurrency and Computation: Practice & Experience
Work stealing and persistence-based load balancers for iterative overdecomposed applications

Proceedings of the 21st international symposium on High-Performance Parallel and Distributed Computing
Data-driven fault tolerance for work stealing computations

Proceedings of the 26th ACM international conference on Supercomputing
Performance characterization of global address space applications: a case study with NWChem

Concurrency and Computation: Practice & Experience
WSCOM: Online Task Scheduling with Data Transfers

CCGRID '12 Proceedings of the 2012 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (ccgrid 2012)
Billion-particle SIMD-friendly two-point correlation on large-scale HPC cluster systems

SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Compiler support for lightweight context switching

ACM Transactions on Architecture and Code Optimization (TACO) - Special Issue on High-Performance Embedded Architectures and Compilers
Dynamic distributed scheduling algorithm for state space search

Euro-Par'12 Proceedings of the 18th international conference on Parallel Processing
Using load information in work-stealing on distributed systems with non-uniform communication latencies

Euro-Par'12 Proceedings of the 18th international conference on Parallel Processing
Programming support and scheduling for communicating parallel tasks

Journal of Parallel and Distributed Computing
Adoption protocols for fanout-optimal fault-tolerant termination detection

Proceedings of the 18th ACM SIGPLAN symposium on Principles and practice of parallel programming
Scheduling parallel programs by work stealing with private deques

Proceedings of the 18th ACM SIGPLAN symposium on Principles and practice of parallel programming
Inspector/executor load balancing algorithms for block-sparse tensor contractions

Proceedings of the 27th international ACM conference on International conference on supercomputing
Design and implementation of a customizable work stealing scheduler

Proceedings of the 3rd International Workshop on Runtime and Operating Systems for Supercomputers
A work-stealing scheduling framework supporting fault tolerance

Proceedings of the Conference on Design, Automation and Test in Europe
A distributed dynamic load balancer for iterative applications

SC '13 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
A framework for load balancing of tensor contraction expressions via dynamic task partitioning

SC '13 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
How to be a successful thief: feudal work stealing for irregular divide-and-conquer applications on heterogeneous distributed systems

Euro-Par'13 Proceedings of the 19th international conference on Parallel Processing
Load balancing non-uniform parallel computations

Proceedings of the 2013 workshop on Programming based on actors, agents, and decentralized control
Energy-efficient work-stealing language runtimes

Proceedings of the 19th international conference on Architectural support for programming languages and operating systems
X10 and APGAS at Petascale

Proceedings of the 19th ACM SIGPLAN symposium on Principles and practice of parallel programming
A topology-aware load balancing algorithm for clustered hierarchical multi-core machines

Future Generation Computer Systems
GLB: lifeline-based global load balancing library in x10

Proceedings of the first workshop on Parallel programming for analytics applications
Friendly barriers: efficient work-stealing with return barriers

Proceedings of the 10th ACM SIGPLAN/SIGOPS international conference on Virtual execution environments

Quantified Score

Hi-index	0.00

Visualization

Abstract

Irregular and dynamic parallel applications pose significant challenges to achieving scalable performance on large-scale multicore clusters. These applications often require ongoing, dynamic load balancing in order to maintain efficiency. Scalable dynamic load balancing on large clusters is a challenging problem which can be addressed with distributed dynamic load balancing systems. Work stealing is a popular approach to distributed dynamic load balancing; however its performance on large-scale clusters is not well understood. Prior work on work stealing has largely focused on shared memory machines. In this work we investigate the design and scalability of work stealing on modern distributed memory systems. We demonstrate high efficiency and low overhead when scaling to 8,192 processors for three benchmark codes: a producer-consumer benchmark, the unbalanced tree search benchmark, and a multiresolution analysis kernel.