A fast algorithm for particle simulations
Journal of Computational Physics
PRESTO: a system for object-oriented parallel programming
Software—Practice & Experience
The design and analysis of spatial data structures
The design and analysis of spatial data structures
ABCL: an object-oriented concurrent system
ABCL: an object-oriented concurrent system
Compiling Fortran D for MIMD distributed-memory machines
Communications of the ACM
Active messages: a mechanism for integrated communication and computation
ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
Implementing an irregular application on a distributed memory multiprocessor
PPOPP '93 Proceedings of the fourth ACM SIGPLAN symposium on Principles and practice of parallel programming
Data locality and load balancing in COOL
PPOPP '93 Proceedings of the fourth ACM SIGPLAN symposium on Principles and practice of parallel programming
CHARM++: a portable concurrent object oriented system based on C++
OOPSLA '93 Proceedings of the eighth annual conference on Object-oriented programming systems, languages, and applications
A parallel hashed Oct-Tree N-body algorithm
Proceedings of the 1993 ACM/IEEE conference on Supercomputing
Supporting dynamic data structures on distributed-memory machines
ACM Transactions on Programming Languages and Systems (TOPLAS)
Cilk: an efficient multithreaded runtime system
PPOPP '95 Proceedings of the fifth ACM SIGPLAN symposium on Principles and practice of parallel programming
The SPLASH-2 programs: characterization and methodological considerations
ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
A hierarchial CPU scheduler for multimedia operating systems
OSDI '96 Proceedings of the second USENIX symposium on Operating systems design and implementation
The Nexus approach to integrating multithreading and communication
Journal of Parallel and Distributed Computing - Special issue on multithreading for multiprocessors
Evaluating high level parallel programming support for irregular applications in ICC++
Software—Practice & Experience
On the design of Chant: a talking threads package
Proceedings of the 1994 ACM/IEEE conference on Supercomputing
Scalable parallel formulations of the barnes-hut method for n-body simulations
Proceedings of the 1994 ACM/IEEE conference on Supercomputing
Fast Messages: Efficient, Portable Communication for Workstation Clusters and MPPs
IEEE Parallel & Distributed Technology: Systems & Technology
Lazy Task Creation: A Technique for Increasing the Granularity of Parallel Programs
IEEE Transactions on Parallel and Distributed Systems
Optimizing COOP Languages: Study of a Protein Dynamics Program
IPPS '96 Proceedings of the 10th International Parallel Processing Symposium
ICC++-AC++ Dialect for High Performance Parallel Computing
ISOTAS '96 Proceedings of the Second JSSST International Symposium on Object Technologies for Advanced Software
Converse: An Interoperable Framework for Parallel Programming
IPPS '96 Proceedings of the 10th International Parallel Processing Symposium
Supporting High Level Programming with High Performance: The Illinois Concert System
HIPS '97 Proceedings of the 1997 Workshop on High-Level Programming Models and Supportive Environments (HIPS '97)
Run-time techniques for dynamic multithreaded computations
Run-time techniques for dynamic multithreaded computations
Adaptive Memory Allocations in Clusters to Handle Unexpectedly Large Data-Intensive Jobs
IEEE Transactions on Parallel and Distributed Systems
Lithe: enabling efficient composition of parallel libraries
HotPar'09 Proceedings of the First USENIX conference on Hot topics in parallelism
JSSPP'10 Proceedings of the 15th international conference on Job scheduling strategies for parallel processing
Towards jungle computing with Ibis/Constellation
Proceedings of the 2011 workshop on Dynamic distributed data-intensive applications, programming abstractions, and systems
Hi-index | 0.00 |
High-level parallel programming models that support dynamic fine-grained threads in a global object space, are becoming increasingly popular for expressing irregular applications based on sophisticated adaptive algorithms and pointer-based data structures. However, implementing these multithreaded computations on scalable parallel machines poses significant challenges, particularly with respect to load-balancing. Load-balancing techniques must simultaneously incur low overhead to support fine-grained threads as well as be sophisticated enough to preserve data locality and thread execution priority.This paper presents a hierarchical framework which addresses these conflicting goals by viewing the computation as being made up of different thread subsets, each of which are load-balanced independently. In contrast to previous processor-centric approaches that have advocated the use of a uniform policy for load-balancing all threads in a computation, our framework allows each thread subset to be load-balanced using a policy most suited to its characteristics (e.g., locality or priority sensitivity). The framework consists of two parts: (i) language support which permits a programmer or the compiler to tag different thread subsets with appropriate policies, and (ii) run-time support which synthesizes overall application load-balance by composing these individual policies.This framework has been implemented in the Illinois Concert runtime system, an implementation platform for fine-grained concurrent object-oriented languages. Results for four large irregular applications on the Cray T3D and the SGI Origin 2000 demonstrate advantages of the hierarchical framework: performance improves by up to an order of magnitude as compared to using a uniform load-balancing policy.