Cache coherence protocols: evaluation using a multiprocessor simulation model
ACM Transactions on Computer Systems (TOCS)
An implementation of distributed shared memory
Software—Practice & Experience
LimitLESS directories: A scalable cache coherence scheme
ASPLOS IV Proceedings of the fourth international conference on Architectural support for programming languages and operating systems
A data locality optimizing algorithm
PLDI '91 Proceedings of the ACM SIGPLAN 1991 conference on Programming language design and implementation
The Stanford Dash Multiprocessor
Computer
Evaluation of compiler optimizations for Fortran D on MIMD distributed memory machines
ICS '92 Proceedings of the 6th international conference on Supercomputing
Global optimizations for parallelism and locality on scalable parallel machines
PLDI '93 Proceedings of the ACM SIGPLAN 1993 conference on Programming language design and implementation
Communication optimization and code generation for distributed memory machines
PLDI '93 Proceedings of the ACM SIGPLAN 1993 conference on Programming language design and implementation
Cooperative shared memory: software and hardware for scalable multiprocessors
ACM Transactions on Computer Systems (TOCS)
Compiler transformations for high-performance computing
ACM Computing Surveys (CSUR)
Optimizing parallel programs with explicit synchronization
PLDI '95 Proceedings of the ACM SIGPLAN 1995 conference on Programming language design and implementation
Techniques for reducing consistency-related communication in distributed shared-memory systems
ACM Transactions on Computer Systems (TOCS)
Compiler cache optimizations for banded matrix problems
ICS '95 Proceedings of the 9th international conference on Supercomputing
Unified compilation techniques for shared and distributed address space machines
ICS '95 Proceedings of the 9th international conference on Supercomputing
ICS '95 Proceedings of the 9th international conference on Supercomputing
Modeling Cache Coherence Overhead with Geometric Objects
CONPAR 94 - VAPP VI Proceedings of the Third Joint International Conference on Vector and Parallel Processing: Parallel Processing
An evaluation of the Cray T3D at CEA/CEL-V
HPCN Europe '95 Proceedings of the International Conference and Exhibition on High-Performance Computing and Networking
Hi-index | 0.00 |
On a distributed shared memory (DSM) system, the optimization of memory access is very important for achieving good performance. The authors propose an optimizing compiler which controls a software cache system implemented on a DSM. The software cache consists of a static part related to the compiler and a dynamic part related to cache-managing runtime routines. The compiler controls the static part of the software cache by using information from static analysis. For applications whose behavior can only be dynamically determined, the compiler uses the dynamic part of the software cache. They also propose the application of RISC-oriented optimization techniques to parallel applications on their software cache system. They evaluate the efficiency of the compiler and RISC-oriented optimization techniques on the CM-5 distributed parallel machine. The results show that the compiler and the optimizations considerably improve the performance of basic linear algebra routines: matrix multiplication, Cholesky decomposition and Gaussian elimination.