A hierarchical load-balancing framework for dynamic multithreaded computations

Authors:
Vijay Karamcheti;Andrew A. Chien
Affiliations:
New York University;University of Illinois, Urbana-Champaign
Venue:
SC '98 Proceedings of the 1998 ACM/IEEE conference on Supercomputing
Year:
1998

Citing 27
Cited 4

A fast algorithm for particle simulations

Journal of Computational Physics
PRESTO: a system for object-oriented parallel programming

Software—Practice & Experience
The design and analysis of spatial data structures

The design and analysis of spatial data structures
ABCL: an object-oriented concurrent system

ABCL: an object-oriented concurrent system
Compiling Fortran D for MIMD distributed-memory machines

Communications of the ACM
Active messages: a mechanism for integrated communication and computation

ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
Implementing an irregular application on a distributed memory multiprocessor

PPOPP '93 Proceedings of the fourth ACM SIGPLAN symposium on Principles and practice of parallel programming
Data locality and load balancing in COOL

PPOPP '93 Proceedings of the fourth ACM SIGPLAN symposium on Principles and practice of parallel programming
CHARM++: a portable concurrent object oriented system based on C++

OOPSLA '93 Proceedings of the eighth annual conference on Object-oriented programming systems, languages, and applications
A parallel hashed Oct-Tree N-body algorithm

Proceedings of the 1993 ACM/IEEE conference on Supercomputing
Parallel Visualization Algorithms: Performance and Architectural Implications

Computer
Supporting dynamic data structures on distributed-memory machines

ACM Transactions on Programming Languages and Systems (TOPLAS)
Cilk: an efficient multithreaded runtime system

PPOPP '95 Proceedings of the fifth ACM SIGPLAN symposium on Principles and practice of parallel programming
The SPLASH-2 programs: characterization and methodological considerations

ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
A hierarchial CPU scheduler for multimedia operating systems

OSDI '96 Proceedings of the second USENIX symposium on Operating systems design and implementation
The Nexus approach to integrating multithreading and communication

Journal of Parallel and Distributed Computing - Special issue on multithreading for multiprocessors
Evaluating high level parallel programming support for irregular applications in ICC++

Software—Practice & Experience
On the design of Chant: a talking threads package

Proceedings of the 1994 ACM/IEEE conference on Supercomputing
Scalable parallel formulations of the barnes-hut method for n-body simulations

Proceedings of the 1994 ACM/IEEE conference on Supercomputing
Fast Messages: Efficient, Portable Communication for Workstation Clusters and MPPs

IEEE Parallel & Distributed Technology: Systems & Technology
Easy-to-Use Object-Oriented Parallel Processing with Mentat

Computer
Lazy Task Creation: A Technique for Increasing the Granularity of Parallel Programs

IEEE Transactions on Parallel and Distributed Systems
Optimizing COOP Languages: Study of a Protein Dynamics Program

IPPS '96 Proceedings of the 10th International Parallel Processing Symposium
ICC++-AC++ Dialect for High Performance Parallel Computing

ISOTAS '96 Proceedings of the Second JSSST International Symposium on Object Technologies for Advanced Software
Converse: An Interoperable Framework for Parallel Programming

IPPS '96 Proceedings of the 10th International Parallel Processing Symposium
Supporting High Level Programming with High Performance: The Illinois Concert System

HIPS '97 Proceedings of the 1997 Workshop on High-Level Programming Models and Supportive Environments (HIPS '97)
Run-time techniques for dynamic multithreaded computations

Run-time techniques for dynamic multithreaded computations

Adaptive Memory Allocations in Clusters to Handle Unexpectedly Large Data-Intensive Jobs

IEEE Transactions on Parallel and Distributed Systems
Lithe: enabling efficient composition of parallel libraries

HotPar'09 Proceedings of the First USENIX conference on Hot topics in parallelism
Hierarchical scheduling of DAG structured computations on manycore processors with dynamic thread grouping

JSSPP'10 Proceedings of the 15th international conference on Job scheduling strategies for parallel processing
Towards jungle computing with Ibis/Constellation

Proceedings of the 2011 workshop on Dynamic distributed data-intensive applications, programming abstractions, and systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

High-level parallel programming models that support dynamic fine-grained threads in a global object space, are becoming increasingly popular for expressing irregular applications based on sophisticated adaptive algorithms and pointer-based data structures. However, implementing these multithreaded computations on scalable parallel machines poses significant challenges, particularly with respect to load-balancing. Load-balancing techniques must simultaneously incur low overhead to support fine-grained threads as well as be sophisticated enough to preserve data locality and thread execution priority.This paper presents a hierarchical framework which addresses these conflicting goals by viewing the computation as being made up of different thread subsets, each of which are load-balanced independently. In contrast to previous processor-centric approaches that have advocated the use of a uniform policy for load-balancing all threads in a computation, our framework allows each thread subset to be load-balanced using a policy most suited to its characteristics (e.g., locality or priority sensitivity). The framework consists of two parts: (i) language support which permits a programmer or the compiler to tag different thread subsets with appropriate policies, and (ii) run-time support which synthesizes overall application load-balance by composing these individual policies.This framework has been implemented in the Illinois Concert runtime system, an implementation platform for fine-grained concurrent object-oriented languages. Results for four large irregular applications on the Cray T3D and the SGI Origin 2000 demonstrate advantages of the hierarchical framework: performance improves by up to an order of magnitude as compared to using a uniform load-balancing policy.