GLB: lifeline-based global load balancing library in x10

Authors:
Wei Zhang;Olivier Tardieu;David Grove;Benjamin Herta;Tomio Kamada;Vijay Saraswat;Mikio Takeuchi
Affiliations:
BM T. J. Watson Research Center, Yorktown Heights, USA;IBM T. J. Watson Research Center, Yorktown Heights, USA;IBM T. J. Watson Research Center, Yorktown Heights, USA;IBM T. J. Watson Research Center, Yorktown Heights, USA;Kobe University/RIKEN, Kobe, Japan;IBM T. J. Watson Research Center, Yorktown Heights, USA;IBM Research - Tokyo, Tokyo, Japan
Venue:
Proceedings of the first workshop on Parallel programming for analytics applications
Year:
2014

Citing 22
Cited 0

CHARM++: a portable concurrent object oriented system based on C++

OOPSLA '93 Proceedings of the eighth annual conference on Object-oriented programming systems, languages, and applications
Scalable load balancing techniques for parallel computers

Journal of Parallel and Distributed Computing
The implementation of the Cilk-5 multithreaded language

PLDI '98 Proceedings of the ACM SIGPLAN 1998 conference on Programming language design and implementation
A Java fork/join framework

Proceedings of the ACM 2000 conference on Java Grande
Efficient load balancing for wide-area divide-and-conquer applications

PPoPP '01 Proceedings of the eighth ACM SIGPLAN symposium on Principles and practices of parallel programming
ATLAS: an infrastructure for global computing

EW 7 Proceedings of the 7th workshop on ACM SIGOPS European workshop: Systems support for worldwide applications
State of the Art in Parallel Search Techniques for Discrete Optimization Problems

IEEE Transactions on Knowledge and Data Engineering
Concurrent clustered programming

CONCUR 2005 - Concurrency Theory
MapReduce: simplified data processing on large clusters

OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
Adaptive and reliable parallel computing on networks of workstations

ATEC '97 Proceedings of the annual conference on USENIX Annual Technical Conference
Scheduling multithreaded computations by work stealing

SFCS '94 Proceedings of the 35th Annual Symposium on Foundations of Computer Science
Solving Large, Irregular Graph Problems Using Adaptive Work-Stealing

ICPP '08 Proceedings of the 2008 37th International Conference on Parallel Processing
Scalable Dynamic Load Balancing Using UPC

ICPP '08 Proceedings of the 2008 37th International Conference on Parallel Processing
Intel threading building blocks

Intel threading building blocks
Work-first and help-first scheduling policies for async-finish task parallelism

IPDPS '09 Proceedings of the 2009 IEEE International Symposium on Parallel&Distributed Processing
Scalable work stealing

Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis
Lifeline-based global load balancing

Proceedings of the 16th ACM symposium on Principles and practice of parallel programming
A load balancing strategy for prioritized execution of tasks

IPPS '93 Proceedings of the 1993 Seventh International Parallel Processing Symposium
A work-stealing scheduler for X10's task parallelism with suspension

Proceedings of the 17th ACM SIGPLAN symposium on Principles and Practice of Parallel Programming
SkewTune: mitigating skew in mapreduce applications

SIGMOD '12 Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data
Managing Asynchronous Operations in Coarray Fortran 2.0

IPDPS '13 Proceedings of the 2013 IEEE 27th International Symposium on Parallel and Distributed Processing
X10 and APGAS at Petascale

Proceedings of the 19th ACM SIGPLAN symposium on Principles and practice of parallel programming

Quantified Score

Hi-index	0.00

Visualization

Abstract

We present GLB, a programming model and an associated implementation that can handle a wide range of irregular parallel programming problems running over large-scale distributed systems. GLB is applicable both to problems that are easily load-balanced via static scheduling and to problems that are hard to statically load balance. GLB hides the intricate synchronizations (e.g., inter-node communication, initialization and startup, load balancing, termination and result collection) from the users. GLB internally uses a version of the lifeline graph based work-stealing algorithm proposed by Saraswat et al [25]. Users of GLB are simply required to write several pieces of sequential code that comply with the GLB interface. GLB then schedules and orchestrates the parallel execution of the code correctly and efficiently at scale. We have applied GLB to two representative benchmarks: Betweenness Centrality (BC) and Unbalanced Tree Search (UTS). Among them, BC can be statically load-balanced whereas UTS cannot. In either case, GLB scales well -- achieving nearly linear speedup on different computer architectures (Power, Blue Gene/Q, and K) -- up to 16K cores.