A hybrid execution model for fine-grained languages on distributed memory multicomputers

  • Authors:
  • John Plevyak;Vijay Karamcheti;Xingbin Zhang;Andrew A. Chien

  • Affiliations:
  • Department of Computer Science, 1304 W. Springfield Avenue, Urbana, IL;Department of Computer Science, 1304 W. Springfield Avenue, Urbana, IL;Department of Computer Science, 1304 W. Springfield Avenue, Urbana, IL;Department of Computer Science, 1304 W. Springfield Avenue, Urbana, IL

  • Venue:
  • Supercomputing '95 Proceedings of the 1995 ACM/IEEE conference on Supercomputing
  • Year:
  • 1995

Quantified Score

Hi-index 0.00

Visualization

Abstract

While fine-grained concurrent languages can naturally capture concurrency in many irregular and dynamic problems, their flexibility has generally resulted in poor execution effciency. In such languages the computation consists of many small threads which are created dynamically and synchronized implicitly. In order to minimize the overhead of these operations, we propose a hybrid execution model which dynamically adapts to runtime data layout, providing both sequential efficiency and low overhead parallel execution. This model uses separately optimized sequential and parallel versions of code. Sequential efficiency is obtained by dynamically coalescing threads via stack-based execution and parallel efficiency through latency hiding and cheap synchronization using heap-allocated activation frames. Novel aspects of the stack mechanism include handling return values for futures and executing forwarded messages (the responsibility to reply is passed along, like call/cc in Scheme) on the stack. In addition, the hybrid execution model is expressed entirely in C, and therefore is easily portable to many systems. Experiments with function-call intensive programs show that this model achieves sequential efficiency comparable to C programs. Experiments with regular and irregular application kernels on the CM5 and T3D demonstrate that it can yield 1.5to 3 times better performance than code optimized for parallel execution alone.