Structure and interpretation of computer programs
Structure and interpretation of computer programs
Minimizing register usage penalty at procedure calls
PLDI '88 Proceedings of the ACM SIGPLAN 1988 conference on Programming Language design and Implementation
An interval-based approach to exhaustive and incremental interprocedural data-flow analysis
ACM Transactions on Programming Languages and Systems (TOPLAS)
Lazy release consistency for software distributed shared memory
ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
Context-sensitive interprocedural points-to analysis in the presence of function pointers
PLDI '94 Proceedings of the ACM SIGPLAN 1994 conference on Programming language design and implementation
Efficient context-sensitive pointer analysis for C programs
PLDI '95 Proceedings of the ACM SIGPLAN 1995 conference on Programming language design and implementation
Interprocedural partial redundancy elimination and its application to distributed memory compilation
PLDI '95 Proceedings of the ACM SIGPLAN 1995 conference on Programming language design and implementation
Shasta: a low overhead, software-only approach for supporting fine-grain shared memory
Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
An integrated compile-time/run-time software distributed shared memory system
Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
SoftFLASH: analyzing the performance of clustered distributed virtual shared memory
Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
PPOPP '97 Proceedings of the sixth ACM SIGPLAN symposium on Principles and practice of parallel programming
MBCF: a protected and virtualized high-speed user-level memory-based communication facility
ICS '98 Proceedings of the 12th international conference on Supercomputing
Global optimization by suppression of partial redundancies
Communications of the ACM
Portable Programs for Parallel Processors
Portable Programs for Parallel Processors
Supporting Software Distributed Shared Memory with an Optimizing Compiler
ICPP '98 Proceedings of the 1998 International Conference on Parallel Processing
Interprocedural Analysis for Parallelization
LCPC '95 Proceedings of the 8th International Workshop on Languages and Compilers for Parallel Computing
Proceedings of a symposium on Compiler optimization
Global common subexpression elimination
Proceedings of a symposium on Compiler optimization
Improving Release-Consistent Shared Virtual Memory using Automatic Update
HPCA '96 Proceedings of the 2nd IEEE Symposium on High-Performance Computer Architecture
Home-Based SVM Protocols for SMP Clusters: Design and Performance
HPCA '98 Proceedings of the 4th International Symposium on High-Performance Computer Architecture
Comparative Evaluation of Fine- and Coarse-Grain Approaches for Software Distributed Shared Memory
HPCA '99 Proceedings of the 5th International Symposium on High Performance Computer Architecture
Efficient Implementation of Software Release Consistency on Asymmetric Distributed Shared Memory
ISPAN '97 Proceedings of the 1997 International Symposium on Parallel Architectures, Algorithms and Networks
Efficient, context-sensitive pointer analysis for c programs
Efficient, context-sensitive pointer analysis for c programs
Hi-index | 0.00 |
To execute shared-memory parallel programs efficiently on distributed-memory systems without remote-caching hardware mechanisms, software-caching mechanisms must be used. We have proposed two compiler-assisted software-caching schemes. One is a page-based system (Asymmetric Distributed Shared Memory: ADSM) that uses virtual memory mechanisms only for read cache-misses. The other is a full user-level system (User-level Distributed Shared Memory: UDSM) that uses user-level checking codes and consistency-management codes.In these schemes, an optimizing compiler directly analyses the shared-memory source programs and optimizes them. It exploits the capabilities of middle-grained or coarse-grained remote-memory accesses to reduce the volume of communications and to reduce the overhead of the cache-emulation codes. It performs interprocedural points-to analysis; interprocedural shared-access set calculations by using interval analysis to solve redundancy elimination equations along with lazy release consistency model. We implemented this optimizing compiler for both ADSM and UDSM, and run-time system for user-level cache-emulation.The run-time system runs on an SS20 workstation cluster connected with a 100BASE-TX Ethernet. Both schemes achieve a high speed-up ratio with the SPLASH-2 benchmark suite. The experimental results show that the combination of the optimizing compiler and Software DSM is very effective. The experimental results also show that the performance of the ADSM scheme is limited by the communication of unnecessary data, while that of the UDSM scheme is limited by the instrumentation overhead.