Source-level global optimizations for fine-grain distributed shared memory systems

Authors:
R. Veldema;R. F. H. Hofman;R. A. F. Bhoedjang;C. J. H. Jacobs;H. E. Bal
Affiliations:
Department of Computer Science, Vrije Universiteit Amsterdam, The Netherlands;Department of Computer Science, Vrije Universiteit Amsterdam, The Netherlands;Department of Computer Science, Cornell University, Ithaca, NY;Department of Computer Science, Vrije Universiteit Amsterdam, The Netherlands;Department of Computer Science, Vrije Universiteit Amsterdam, The Netherlands
Venue:
PPoPP '01 Proceedings of the eighth ACM SIGPLAN symposium on Principles and practices of parallel programming
Year:
2001

Citing 27
Cited 9

Lazy code motion

PLDI '92 Proceedings of the ACM SIGPLAN 1992 conference on Programming language design and implementation
Computation migration: enhancing locality for distributed-memory parallel systems

PPOPP '93 Proceedings of the fourth ACM SIGPLAN symposium on Principles and practice of parallel programming
Fine-grain access control for distributed shared memory

ASPLOS VI Proceedings of the sixth international conference on Architectural support for programming languages and operating systems
Supporting dynamic data structures on distributed-memory machines

ACM Transactions on Programming Languages and Systems (TOPLAS)
The SPLASH-2 programs: characterization and methodological considerations

ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
CRL: high-performance all-software distributed shared memory

SOSP '95 Proceedings of the fifteenth ACM symposium on Operating systems principles
Shasta: a low overhead, software-only approach for supporting fine-grain shared memory

Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
An integrated compile-time/run-time software distributed shared memory system

Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
Performance evaluation of two home-based lazy release consistency protocols for shared virtual memory systems

OSDI '96 Proceedings of the second USENIX symposium on Operating systems design and implementation
Putting pointer analysis to work

POPL '98 Proceedings of the 25th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Performance evaluation of the Orca shared-object system

ACM Transactions on Computer Systems (TOCS)
An efficient implementation of Java's remote method invocation

Proceedings of the seventh ACM SIGPLAN symposium on Principles and practice of parallel programming
Fixing the Java memory model

JAVA '99 Proceedings of the ACM 1999 conference on Java Grande
Escape analysis for Java

Proceedings of the 14th ACM SIGPLAN conference on Object-oriented programming, systems, languages, and applications
Compositional pointer and escape analysis for Java programs

Proceedings of the 14th ACM SIGPLAN conference on Object-oriented programming, systems, languages, and applications
An automatic object inlining optimization and its evaluation

PLDI '00 Proceedings of the ACM SIGPLAN 2000 conference on Programming language design and implementation
Dynamic computation migration in DSM systems

Supercomputing '96 Proceedings of the 1996 ACM/IEEE conference on Supercomputing
Runtime optimizations for a Java DSM implementation

Proceedings of the 2001 joint ACM-ISCOPE conference on Java Grande
Evaluating design alternatives for reliable communication on high-speed networks

ASPLOS IX Proceedings of the ninth international conference on Architectural support for programming languages and operating systems
The Java Language Specification

The Java Language Specification
Myrinet: A Gigabit-per-Second Local Area Network

IEEE Micro
An Implementation of Interprocedural Bounded Regular Section Analysis

IEEE Transactions on Parallel and Distributed Systems
Executing Java threads in parallel in a distributed-memory environment

CASCON '98 Proceedings of the 1998 conference of the Centre for Advanced Studies on Collaborative research
Using memory-mapped network interfaces to improve the performance of distributed shared memory

HPCA '96 Proceedings of the 2nd IEEE Symposium on High-Performance Computer Architecture
Sirocco: Cost-Effective Fine-Grain Distributed Shared Memory

PACT '98 Proceedings of the 1998 International Conference on Parallel Architectures and Compilation Techniques
cJVM: A Single System Image of a JVM on a Cluster

ICPP '99 Proceedings of the 1999 International Conference on Parallel Processing
Overview of the IBM Java just-in-time compiler

IBM Systems Journal

Efficient Java RMI for parallel programming

ACM Transactions on Programming Languages and Systems (TOPLAS)
Programming environments for high-performance grid computing: the Albatross project

Future Generation Computer Systems - Grid computing: Towards a new computing infrastructure
A platform-independent distributed runtime for standard multithreaded Java

International Journal of Parallel Programming
Supporting Huge Address Spaces in a Virtual Machine for Java on a Cluster

Languages and Compilers for Parallel Computing
Evaluation of RDMA Opportunities in an Object-Oriented DSM

Languages and Compilers for Parallel Computing
Pleiad: a cross-environment middleware providing efficient multithreading on clusters

Proceedings of the 6th ACM conference on Computing frontiers
Towards an actor-based concurrent machine model

Proceedings of the 4th workshop on the Implementation, Compilation, Optimization of Object-Oriented Languages and Programming Systems
Design of concurrent utilities in jackal: a software DSM implementation

ICDCN'08 Proceedings of the 9th international conference on Distributed computing and networking
Esodyp+: prefetching in the Jackal software DSM

Euro-Par'07 Proceedings of the 13th international Euro-Par conference on Parallel Processing

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper describes and evaluates the use of aggressive static analysis in Jackal, a fine-grain Distributed Shared Memory (DSM) system for Java. Jackal uses an optimizing, source-level compiler rather than the binary rewriting techniques employed by most other fine-grain DSM systems. Source-level analysis makes existing access-check optimizations (e.g., access-check batching) more effective and enables two novel fine-grain DSM optimizations: object-graph aggregation and automatic computation migration.The compiler detects situations where an access to a root object is followed by accesses to subobjects. Jackal attempts to aggregate all access checks on objects in such object graphs into a single check on the graph's root object. If this check fails, the entire graph is fetched. Object-graph aggregation can reduce the number of network roundtrips and, since it is an advanced form of access-check batching, improves sequential performance.Computation migration (or function shipping) is used to optimize critical sections in which a single processor owns both the shared data that is accessed and the lock that protects the data. It is usually more efficient to execute such critical sections on the processor that holds the lock and the data than to incur multiple roundtrips for acquiring the lock, fetching the data, writing the data back, and releasing the lock. Jackal's compiler detects such critical sections and optimizes them by generating single-roundtrip computation-migration code rather than standard data-shipping code. Jackal's optimizations improve both sequential and parallel application performance. On average, sequential execution times of instrumented, optimized programs are within 10% of those of uninstrumented programs. Application speedups usually improve significantly and several Jackal applications perform as well as hand-optimized message-passing programs.