Comparative evaluation of latency reducing and tolerating techniques
ISCA '91 Proceedings of the 18th annual international symposium on Computer architecture
Implementation and performance of Munin
SOSP '91 Proceedings of the thirteenth ACM symposium on Operating systems principles
Eliminating false data dependences using the Omega test
PLDI '92 Proceedings of the ACM SIGPLAN 1992 conference on Programming language design and implementation
Design and evaluation of a compiler algorithm for prefetching
ASPLOS V Proceedings of the fifth international conference on Architectural support for programming languages and operating systems
Cooperative shared memory: software and hardware for scalable multiprocessor
ASPLOS V Proceedings of the fifth international conference on Architectural support for programming languages and operating systems
The KSR1: experimentation and modeling of poststore
SIGMETRICS '93 Proceedings of the 1993 ACM SIGMETRICS conference on Measurement and modeling of computer systems
The Stanford FLASH multiprocessor
ISCA '94 Proceedings of the 21st annual international symposium on Computer architecture
Tempest and typhoon: user-level shared memory
ISCA '94 Proceedings of the 21st annual international symposium on Computer architecture
An optimizing Fortran D compiler for MIMD distributed-memory machines
An optimizing Fortran D compiler for MIMD distributed-memory machines
LCM: memory system support for parallel language implementation
ASPLOS VI Proceedings of the sixth international conference on Architectural support for programming languages and operating systems
Efficient support for irregular applications on distributed-memory machines
PPOPP '95 Proceedings of the fifth ACM SIGPLAN symposium on Principles and practice of parallel programming
Data and computation transformations for multiprocessors
PPOPP '95 Proceedings of the fifth ACM SIGPLAN symposium on Principles and practice of parallel programming
Data forwarding in scalable shared-memory multiprocessors
ICS '95 Proceedings of the 9th international conference on Supercomputing
Evaluating the impact of advanced memory systems on compiler-parallelized codes
PACT '95 Proceedings of the IFIP WG10.3 working conference on Parallel architectures and compilation techniques
Teapot: language support for writing memory coherence protocols
PLDI '96 Proceedings of the ACM SIGPLAN 1996 conference on Programming language design and implementation
Decoupled hardware support for distributed shared memory
ISCA '96 Proceedings of the 23rd annual international symposium on Computer architecture
STiNG: a CC-NUMA computer system for the commercial marketplace
ISCA '96 Proceedings of the 23rd annual international symposium on Computer architecture
Shasta: a low overhead, software-only approach for supporting fine-grain shared memory
Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
An integrated compile-time/run-time software distributed shared memory system
Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
A Unified Framework for Optimizing Communication in Data-Parallel Programs
IEEE Transactions on Parallel and Distributed Systems
Application-specific protocols for user-level shared memory
Proceedings of the 1994 ACM/IEEE conference on Supercomputing
An Implementation of Interprocedural Bounded Regular Section Analysis
IEEE Transactions on Parallel and Distributed Systems
Compiling Global Name-Space Parallel Loops for Distributed Execution
IEEE Transactions on Parallel and Distributed Systems
IPPS '97 Proceedings of the 11th International Symposium on Parallel Processing
Enhancing Software DSM for Compiler-Parallelized Applications
IPPS '97 Proceedings of the 11th International Symposium on Parallel Processing
HPF on Fine-Grain Distributed Shared Memory: Early Experience
LCPC '96 Proceedings of the 9th International Workshop on Languages and Compilers for Parallel Computing
Tempest: a substrate for portable parallel programs
COMPCON '95 Proceedings of the 40th IEEE Computer Society International Conference
Compiling for Locality of Reference
Compiling for Locality of Reference
Shared-memory performance profiling
PPOPP '97 Proceedings of the sixth ACM SIGPLAN symposium on Principles and practice of parallel programming
Performance evaluation of the Orca shared-object system
ACM Transactions on Computer Systems (TOCS)
Retrospective: tempest and typhoon: user-level shared memory
25 years of the international symposia on Computer architecture (selected papers)
Hardware Support for Flexible Distributed Shared Memory
IEEE Transactions on Computers
A synthesis of memory mechanisms for distributed architectures
ICS '01 Proceedings of the 15th international conference on Supercomputing
Accurate data redistribution cost estimation in software distributed shared memory systems
PPoPP '01 Proceedings of the eighth ACM SIGPLAN symposium on Principles and practices of parallel programming
Eliminating Barrier Synchronization for Compiler-Parallelized Codes on Software DSMs
International Journal of Parallel Programming
Efficient Support for Two-Dimensional Data Distributions in Distributed Shared Memory Systems
IPDPS '02 Proceedings of the 16th International Parallel and Distributed Processing Symposium
Controlling Distributed Shared Memory Consistency from High Level Programming Languages
IPDPS '00 Proceedings of the 15 IPDPS 2000 Workshops on Parallel and Distributed Processing
Improving Compiler and Run-Time Support for Irregular Reductions Using Local Writes
LCPC '98 Proceedings of the 11th International Workshop on Languages and Compilers for Parallel Computing
Evaluating the Effectiveness of a Parallelizing Compiler
LCR '98 Selected Papers from the 4th International Workshop on Languages, Compilers, and Run-Time Systems for Scalable Computers
Compilation and Runtime-Optimizations for Software Distributed Shared Memory
LCR '00 Selected Papers from the 5th International Workshop on Languages, Compilers, and Run-Time Systems for Scalable Computers
A Comparison of Locality Transformations for Irregular Codes
LCR '00 Selected Papers from the 5th International Workshop on Languages, Compilers, and Run-Time Systems for Scalable Computers
Compile-time Synchronization Optimizations for Software DSMs
IPPS '98 Proceedings of the 12th. International Parallel Processing Symposium on International Parallel Processing Symposium
Update Protocols and Iterative Scientific Applications
IPPS '98 Proceedings of the 12th. International Parallel Processing Symposium on International Parallel Processing Symposium
Exploiting Locality for Irregular Scientific Codes
IEEE Transactions on Parallel and Distributed Systems
Compiling for a hybrid programming model using the LMAD representation
LCPC'01 Proceedings of the 14th international conference on Languages and compilers for parallel computing
Evaluation of a virtual shared memory machine by the compilation of data parallel loops
EURO-PDP'00 Proceedings of the 8th Euromicro conference on Parallel and distributed processing
Update protocols and cluster-based shared memory
Computer Communications
Hi-index | 0.00 |
Unlike compiler-generated message-passing code, the coherence mechanisms in shared-memory systems work equally well for regular and irregular programs. In many programs, however compile-time information about data accesses would permit data to be transferred more efficiently---if the underlying shared-memory system offered suitable primitives. This paper demonstrates that cooperation between a compiler and a memory coherence protocol can improve the performance of High Performance Fortran (HPF) programs running on fine-grain distributed shared memory system up to a factor of 2, while retaining the versatility and portability of shared memory. As a consequence, shared memory's performance becomes competitive with message passing for regular applications, while not affecting (or in some cases, even improving) its large advantage for irregular codes. This paper describes the design of our implementation and reports experimental results.