CRAUL: Compiler and run-time integration for adaptation under load[1]This work was supported in part by NSF grants CDA-9401142, CCR-9702466, and CCR-9705594; and an external research grant from Compaq.

Authors:
Sotiris Ioannidis;Umit Rencuzogullari;Robert Stets;Sandhya Dwarkadas
Affiliations:
Department of Computer Science, University of Rochester, Rochester, NY 14627-0226, USA Tel.&colon/ +1 716 275 5647&semi/ Fax&colon/ +1 716 461 2018&semi/ E-mail&colon/ {si,umit,stets,sandhya}@cs.r ...;Department of Computer Science, University of Rochester, Rochester, NY 14627-0226, USA Tel.&colon/ +1 716 275 5647&semi/ Fax&colon/ +1 716 461 2018&semi/ E-mail&colon/ {si,umit,stets,sandhya}@cs.r ...;Department of Computer Science, University of Rochester, Rochester, NY 14627-0226, USA Tel.&colon/ +1 716 275 5647&semi/ Fax&colon/ +1 716 461 2018&semi/ E-mail&colon/ {si,umit,stets,sandhya}@cs.r ...;(Corresponding author) Department of Computer Science, University of Rochester, Rochester, NY 14627-0226, USA Tel.&colon/ +1 716 275 5647&semi/ Fax&colon/ +1 716 461 2018&semi/ E-mail&colon/ {si,u ...
Venue:
Scientific Programming
Year:
1999

Citing 21
Cited 3

Allocating Independent Subtasks on Parallel Processors

IEEE Transactions on Software Engineering
Memory coherence in shared virtual memory systems

ACM Transactions on Computer Systems (TOCS)
Implementation and performance of Munin

SOSP '91 Proceedings of the thirteenth ACM symposium on Operating systems principles
Lazy release consistency for software distributed shared memory

ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
TreadMarks: Shared Memory Computing on Networks of Workstations

Computer
An integrated compile-time/run-time software distributed shared memory system

Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
Adaptively Scheduling Parallel Loops in Distributed Shared-Memory Systems

IEEE Transactions on Parallel and Distributed Systems
Customized dynamic load balancing for a network of workstations

Journal of Parallel and Distributed Computing
Compiler and software distributed shared memory support for irregular applications

PPOPP '97 Proceedings of the sixth ACM SIGPLAN symposium on Principles and practice of parallel programming
Cashmere-2L: software coherent shared memory on a clustered remote-write network

Proceedings of the sixteenth ACM symposium on Operating systems principles
Shared Memory Consistency Models: A Tutorial

Computer
Memory Channel Network for PCI

IEEE Micro
An Implementation of Interprocedural Bounded Regular Section Analysis

IEEE Transactions on Parallel and Distributed Systems
Using Processor Affinity in Loop Scheduling on Shared-Memory Multiprocessors

IEEE Transactions on Parallel and Distributed Systems
An Adaptive Approach to Data Placement

IPPS '96 Proceedings of the 10th International Parallel Processing Symposium
Evaluating the Performance of Software Distributed Shared Memory as a Target for Parallelizing Compilers

IPPS '97 Proceedings of the 11th International Symposium on Parallel Processing
Data parallel programming in an adaptive environment

IPPS '95 Proceedings of the 9th International Symposium on Parallel Processing
An Overview of a Compiler for Scalable Parallel Machines

Proceedings of the 6th International Workshop on Languages and Compilers for Parallel Computing
Load Balancing for Regular Data-Parallel Applications on Workstation Network

CANPC '97 Proceedings of the First International Workshop on Communication and Architectural Support for Network-Based Parallel Computing
Loop scheduling for heterogeneity

HPDC '95 Proceedings of the 4th IEEE International Symposium on High Performance Distributed Computing
Distributed filaments: efficient fine-grain parallelism on a cluster of workstations

OSDI '94 Proceedings of the 1st USENIX conference on Operating Systems Design and Implementation

Dynamic adaptation to available resources for parallel computing in an autonomous network of workstations

PPoPP '01 Proceedings of the eighth ACM SIGPLAN symposium on Principles and practices of parallel programming
Program phase detection and exploitation

IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
Online algorithms to minimize resource reallocations and network communication

APPROX'06/RANDOM'06 Proceedings of the 9th international conference on Approximation Algorithms for Combinatorial Optimization Problems, and 10th international conference on Randomization and Computation

Quantified Score

Hi-index	0.00

Visualization

Abstract

Clusters of workstations provide a cost-effective, high performance parallel computing environment. These environments, however, are often shared by multiple users, or may consist of heterogeneous machines. As a result, parallel applications executing in these environments must operate despite unequal computational resources. For maximum performance, applications should automatically adapt execution to maximize use of the available resources. Ideally, this adaptation should be transparent to the application programmer. In this paper, we present CRAUL (Compiler and Run-Time Integration for Adaptation Under Load), a system that dynamically balances computational load in a parallel application. Our target run-time is software-based distributed shared memory (SDSM). SDSM is a good target for parallelizing compilers since it reduces compile-time complexity by providing data caching and other support for dynamic load balancing. CRAUL combines compile-time support to identify data access patterns with a run-time system that uses the access information to intelligently distribute the parallel workload in loop-based programs. The distribution is chosen according to the relative power of the processors and so as to minimize SDSM overhead and maximize locality. We have evaluated the resulting load distribution in the presence of different types of load - computational, computational and memory intensive, and network load. CRAUL performs within 5-23% of ideal in the presence of load, and is able to improve on naive compiler-based work distribution that does not take locality into account even in the absence of load.