A hybrid execution model for fine-grained languages on distributed memory multicomputers

Authors:
John Plevyak;Vijay Karamcheti;Xingbin Zhang;Andrew A. Chien
Affiliations:
Department of Computer Science, 1304 W. Springfield Avenue, Urbana, IL;Department of Computer Science, 1304 W. Springfield Avenue, Urbana, IL;Department of Computer Science, 1304 W. Springfield Avenue, Urbana, IL;Department of Computer Science, 1304 W. Springfield Avenue, Urbana, IL
Venue:
Supercomputing '95 Proceedings of the 1995 ACM/IEEE conference on Supercomputing
Year:
1995

Citing 22
Cited 11

MULTILISP: a language for concurrent symbolic computation

ACM Transactions on Programming Languages and Systems (TOPLAS)
Actors: a model of concurrent computation in distributed systems

Actors: a model of concurrent computation in distributed systems
A fast algorithm for particle simulations

Journal of Computational Physics
Mul-T: a high-performance parallel Lisp

PLDI '89 Proceedings of the ACM SIGPLAN 1989 Conference on Programming language design and implementation
Experience with CST: programming and implementation

PLDI '89 Proceedings of the ACM SIGPLAN 1989 Conference on Programming language design and implementation
ABCL: an object-oriented concurrent system

ABCL: an object-oriented concurrent system
Compiling Fortran D for MIMD distributed-memory machines

Communications of the ACM
A foundation for an efficient multi-threaded scheme system

LFP '92 Proceedings of the 1992 ACM conference on LISP and functional programming
Concurrent aggregates: supporting modularity in massively parallel programs

Concurrent aggregates: supporting modularity in massively parallel programs
Leapfrogging: a portable technique for implementing efficient futures

PPOPP '93 Proceedings of the fourth ACM SIGPLAN symposium on Principles and practice of parallel programming
Parallel programming in Split-C

Proceedings of the 1993 ACM/IEEE conference on Supercomputing
Concert-efficient runtime support for concurrent object-oriented programming languages on stock hardware

Proceedings of the 1993 ACM/IEEE conference on Supercomputing
Precise concrete type inference for object-oriented languages

OOPSLA '94 Proceedings of the ninth annual conference on Object-oriented programming systems, language, and applications
Obtaining sequential efficiency for concurrent object-oriented languages

POPL '95 Proceedings of the 22nd ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Supporting dynamic data structures on distributed-memory machines

ACM Transactions on Programming Languages and Systems (TOPLAS)
Cilk: an efficient multithreaded runtime system

PPOPP '95 Proceedings of the fifth ACM SIGPLAN symposium on Principles and practice of parallel programming
Computer Simulations of Protein Dynamics and Thermodynamics

Computer
Lazy Task Creation: A Technique for Increasing the Granularity of Parallel Programs

IEEE Transactions on Parallel and Distributed Systems
Demonstration of Automatic Data Partitioning Techniques for Parallelizing Compilers on Multicomputers

IEEE Transactions on Parallel and Distributed Systems
StackThreads: An Abstract Machine for Scheduling Fine-Grain Threads on Stock CPUs

TPPP '94 Proceedings of the International Workshop on Theory and Practice of Parallel Programming
The concert system--compiler and runtime support for efficient, fine-grained concurrent object-oriented programs

The concert system--compiler and runtime support for efficient, fine-grained concurrent object-oriented programs
Foundations of Actor Semantics

Foundations of Actor Semantics

Whole-program optimization for time and space efficient threads

Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
Dynamic feedback: an effective technique for adaptive computing

Proceedings of the ACM SIGPLAN 1997 conference on Programming language design and implementation
Fine-grain multithreading with minimal compiler support—a cost effective approach to implementing efficient multithreading languages

Proceedings of the ACM SIGPLAN 1997 conference on Programming language design and implementation
StackThreads/MP: integrating futures into calling standards

Proceedings of the seventh ACM SIGPLAN symposium on Principles and practice of parallel programming
Evaluating the performance limitations of MPMD communication

SC '97 Proceedings of the 1997 ACM/IEEE conference on Supercomputing
Optimizing COOP Languages: Study of a Protein Dynamics Program

IPPS '96 Proceedings of the 10th International Parallel Processing Symposium
Performance Evaluation of OpenMP Applications with Nested Parallelism

LCR '00 Selected Papers from the 5th International Workshop on Languages, Compilers, and Run-Time Systems for Scalable Computers
MORPH: a system architecture for robust high performance using customization (an NSF 100 TeraOps point design study)

FRONTIERS '96 Proceedings of the 6th Symposium on the Frontiers of Massively Parallel Computation
Supporting High Level Programming with High Performance: The Illinois Concert System

HIPS '97 Proceedings of the 1997 Workshop on High-Level Programming Models and Supportive Environments (HIPS '97)
A comparative analysis of fine-grain threads packages

Journal of Parallel and Distributed Computing
Lightweight lexical closures for legitimate execution stack access

CC'06 Proceedings of the 15th international conference on Compiler Construction

Quantified Score

Hi-index	0.00

Visualization

Abstract

While fine-grained concurrent languages can naturally capture concurrency in many irregular and dynamic problems, their flexibility has generally resulted in poor execution effciency. In such languages the computation consists of many small threads which are created dynamically and synchronized implicitly. In order to minimize the overhead of these operations, we propose a hybrid execution model which dynamically adapts to runtime data layout, providing both sequential efficiency and low overhead parallel execution. This model uses separately optimized sequential and parallel versions of code. Sequential efficiency is obtained by dynamically coalescing threads via stack-based execution and parallel efficiency through latency hiding and cheap synchronization using heap-allocated activation frames. Novel aspects of the stack mechanism include handling return values for futures and executing forwarded messages (the responsibility to reply is passed along, like call/cc in Scheme) on the stack. In addition, the hybrid execution model is expressed entirely in C, and therefore is easily portable to many systems. Experiments with function-call intensive programs show that this model achieves sequential efficiency comparable to C programs. Experiments with regular and irregular application kernels on the CM5 and T3D demonstrate that it can yield 1.5to 3 times better performance than code optimized for parallel execution alone.