Capsules: Expressing Composable Computations in a Parallel Programming Model

Authors:
Hasnain A. Mandviwala;Umakishore Ramachandran;Kathleen Knobe
Affiliations:
College of Computing, Georgia Institute of Technology,;College of Computing, Georgia Institute of Technology,;Intel Corporation Inc.,
Venue:
Languages and Compilers for Parallel Computing
Year:
2007

Citing 12
Cited 1

Generative communication in Linda

ACM Transactions on Programming Languages and Systems (TOPLAS)
Coarse-grain parallel programming in Jade

PPOPP '91 Proceedings of the third ACM SIGPLAN symposium on Principles and practice of parallel programming
Heterogeneous parallel programming in Jade

Proceedings of the 1992 ACM/IEEE conference on Supercomputing
Cilk: an efficient multithreaded runtime system

PPOPP '95 Proceedings of the fifth ACM SIGPLAN symposium on Principles and practice of parallel programming
Space-time memory: a parallel programming abstraction for interactive multimedia applications

Proceedings of the seventh ACM SIGPLAN symposium on Principles and practice of parallel programming
Jade: A High-Level, Machine-Independent Language for Parallel Programming

Computer
Stampede: A Programming System for Emerging Scalable Interactive Multimedia Applications

LCPC '98 Proceedings of the 11th International Workshop on Languages and Compilers for Parallel Computing
Performance Evaluation of the Omni OpenMP Compiler

ISHPC '00 Proceedings of the Third International Symposium on High Performance Computing
Software and the Concurrency Revolution

Queue - Multiprocessors
A versatile stereo implementation on commodity graphics hardware

Real-Time Imaging
Integrated task and data parallel support for dynamic applications

Scientific Programming
Stampede: a cluster programming middleware for interactive stream-oriented applications

IEEE Transactions on Parallel and Distributed Systems

Improving performance of adaptive component-based dataflow middleware

Parallel Computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

A well-known problem in designing high-level parallel programming models and languages is the "granularity problem", where the execution of parallel task instances that are too fine-grain incur large overheads in the parallel run-time and decrease the speed-up achieved by parallel execution. On the other hand, tasks that are too coarse-grain create load-imbalance and do not adequately utilize the parallel machine. In this work we attempt to address this issue with a concept of expressing "composable computations" in a parallel programming model called "Capsules". Such composability allows adjustment of execution granularity at run-time.In Capsules, we provide a unifying framework that allows composition and adjustment of granularity for both data and computation over iteration space and computation space. We show that this concept not only allows the user to express the decision on granularity of execution, but also the decision on the granularity of garbage collection, and other features that may be supported by the programming model.We argue that this adaptability of execution granularity leads to efficient parallel execution by matching the available application concurrency to the available hardware concurrency, thereby reducing parallelization overhead. By matching, we refer to creating coarse-grain Computation Capsules, that encompass multiple instances of fine-grain computation instances. In effect, creating coarse-grain computations reduces overhead by simply reducing the number of parallel computations. This leads to: (1) Reduced synchronization cost such as for blocked searches in shared data-structures; (2) Reduced distribution and scheduling cost for parallel computation instances; and (3) Reduced book-keeping cost maintain data-structures such as for unfulfilled data requests.Capsules builds on our prior work, TStreams, a data-flow oriented parallel programming framework. Our results on an SMP machine using the Cascade Face Detector, and the Stereo Vision Depth applications show that adjusting execution granularity through profiling helps determine optimal coarse-grain serial execution granularity, reduces parallelization overhead and yields maximum application performance.