PLDI '92 Proceedings of the ACM SIGPLAN 1992 conference on Programming language design and implementation
Computation migration: enhancing locality for distributed-memory parallel systems
PPOPP '93 Proceedings of the fourth ACM SIGPLAN symposium on Principles and practice of parallel programming
Fine-grain access control for distributed shared memory
ASPLOS VI Proceedings of the sixth international conference on Architectural support for programming languages and operating systems
Supporting dynamic data structures on distributed-memory machines
ACM Transactions on Programming Languages and Systems (TOPLAS)
The SPLASH-2 programs: characterization and methodological considerations
ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
CRL: high-performance all-software distributed shared memory
SOSP '95 Proceedings of the fifteenth ACM symposium on Operating systems principles
Shasta: a low overhead, software-only approach for supporting fine-grain shared memory
Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
An integrated compile-time/run-time software distributed shared memory system
Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
OSDI '96 Proceedings of the second USENIX symposium on Operating systems design and implementation
Putting pointer analysis to work
POPL '98 Proceedings of the 25th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Performance evaluation of the Orca shared-object system
ACM Transactions on Computer Systems (TOCS)
An efficient implementation of Java's remote method invocation
Proceedings of the seventh ACM SIGPLAN symposium on Principles and practice of parallel programming
JAVA '99 Proceedings of the ACM 1999 conference on Java Grande
Proceedings of the 14th ACM SIGPLAN conference on Object-oriented programming, systems, languages, and applications
Compositional pointer and escape analysis for Java programs
Proceedings of the 14th ACM SIGPLAN conference on Object-oriented programming, systems, languages, and applications
An automatic object inlining optimization and its evaluation
PLDI '00 Proceedings of the ACM SIGPLAN 2000 conference on Programming language design and implementation
Dynamic computation migration in DSM systems
Supercomputing '96 Proceedings of the 1996 ACM/IEEE conference on Supercomputing
Runtime optimizations for a Java DSM implementation
Proceedings of the 2001 joint ACM-ISCOPE conference on Java Grande
Evaluating design alternatives for reliable communication on high-speed networks
ASPLOS IX Proceedings of the ninth international conference on Architectural support for programming languages and operating systems
The Java Language Specification
The Java Language Specification
An Implementation of Interprocedural Bounded Regular Section Analysis
IEEE Transactions on Parallel and Distributed Systems
Executing Java threads in parallel in a distributed-memory environment
CASCON '98 Proceedings of the 1998 conference of the Centre for Advanced Studies on Collaborative research
Using memory-mapped network interfaces to improve the performance of distributed shared memory
HPCA '96 Proceedings of the 2nd IEEE Symposium on High-Performance Computer Architecture
Sirocco: Cost-Effective Fine-Grain Distributed Shared Memory
PACT '98 Proceedings of the 1998 International Conference on Parallel Architectures and Compilation Techniques
cJVM: A Single System Image of a JVM on a Cluster
ICPP '99 Proceedings of the 1999 International Conference on Parallel Processing
Overview of the IBM Java just-in-time compiler
IBM Systems Journal
Efficient Java RMI for parallel programming
ACM Transactions on Programming Languages and Systems (TOPLAS)
Programming environments for high-performance grid computing: the Albatross project
Future Generation Computer Systems - Grid computing: Towards a new computing infrastructure
A platform-independent distributed runtime for standard multithreaded Java
International Journal of Parallel Programming
Supporting Huge Address Spaces in a Virtual Machine for Java on a Cluster
Languages and Compilers for Parallel Computing
Evaluation of RDMA Opportunities in an Object-Oriented DSM
Languages and Compilers for Parallel Computing
Pleiad: a cross-environment middleware providing efficient multithreading on clusters
Proceedings of the 6th ACM conference on Computing frontiers
Towards an actor-based concurrent machine model
Proceedings of the 4th workshop on the Implementation, Compilation, Optimization of Object-Oriented Languages and Programming Systems
Design of concurrent utilities in jackal: a software DSM implementation
ICDCN'08 Proceedings of the 9th international conference on Distributed computing and networking
Esodyp+: prefetching in the Jackal software DSM
Euro-Par'07 Proceedings of the 13th international Euro-Par conference on Parallel Processing
Hi-index | 0.00 |
This paper describes and evaluates the use of aggressive static analysis in Jackal, a fine-grain Distributed Shared Memory (DSM) system for Java. Jackal uses an optimizing, source-level compiler rather than the binary rewriting techniques employed by most other fine-grain DSM systems. Source-level analysis makes existing access-check optimizations (e.g., access-check batching) more effective and enables two novel fine-grain DSM optimizations: object-graph aggregation and automatic computation migration.The compiler detects situations where an access to a root object is followed by accesses to subobjects. Jackal attempts to aggregate all access checks on objects in such object graphs into a single check on the graph's root object. If this check fails, the entire graph is fetched. Object-graph aggregation can reduce the number of network roundtrips and, since it is an advanced form of access-check batching, improves sequential performance.Computation migration (or function shipping) is used to optimize critical sections in which a single processor owns both the shared data that is accessed and the lock that protects the data. It is usually more efficient to execute such critical sections on the processor that holds the lock and the data than to incur multiple roundtrips for acquiring the lock, fetching the data, writing the data back, and releasing the lock. Jackal's compiler detects such critical sections and optimizes them by generating single-roundtrip computation-migration code rather than standard data-shipping code. Jackal's optimizations improve both sequential and parallel application performance. On average, sequential execution times of instrumented, optimized programs are within 10% of those of uninstrumented programs. Application speedups usually improve significantly and several Jackal applications perform as well as hand-optimized message-passing programs.