Efficient and correct execution of parallel programs that share memory
ACM Transactions on Programming Languages and Systems (TOPLAS)
Static analysis of low-level synchronization
PADD '88 Proceedings of the 1988 ACM SIGPLAN and SIGOPS workshop on Parallel and distributed debugging
Compiling programs with user parallelism
Selected papers of the second workshop on Languages and compilers for parallel computing
The SPARC architecture manual: version 8
The SPARC architecture manual: version 8
Execution time support for adaptive scientific algorithms on distributed
Concurrency: Practice and Experience
Compiler optimizations for Fortran D on MIMD distributed-memory machines
Proceedings of the 1991 ACM/IEEE conference on Supercomputing
SPLASH: Stanford parallel applications for shared-memory
ACM SIGARCH Computer Architecture News
Active messages: a mechanism for integrated communication and computation
ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
Data flow equations for explicitly parallel programs
PPOPP '93 Proceedings of the fourth ACM SIGPLAN symposium on Principles and practice of parallel programming
Parallel programming in Split-C
Proceedings of the 1993 ACM/IEEE conference on Supercomputing
PPOPP '95 Proceedings of the fifth ACM SIGPLAN symposium on Principles and practice of parallel programming
Reducing false sharing on shared memory multiprocessors through compile time data transformations
PPOPP '95 Proceedings of the fifth ACM SIGPLAN symposium on Principles and practice of parallel programming
Empirical evaluation of the CRAY-T3D: a compiler perspective
ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Weak ordering—a new definition
ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture
Memory consistency and event ordering in scalable shared-memory multiprocessors
ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture
The directory-based cache coherence protocol for the DASH multiprocessor
ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture
Compiling for Distributed Memory Architectures
IEEE Transactions on Parallel and Distributed Systems
Optimizing Parallel SPMD Programs
LCPC '94 Proceedings of the 7th International Workshop on Languages and Compilers for Parallel Computing
POPL '98 Proceedings of the 25th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Communication optimizations for parallel C programs
PLDI '98 Proceedings of the ACM SIGPLAN 1998 conference on Programming language design and implementation
Retrospective: memory consistency and event ordering in scalable shared-memory multiprocessors
25 years of the international symposia on Computer architecture (selected papers)
Basic compiler algorithms for parallel programs
Proceedings of the seventh ACM SIGPLAN symposium on Principles and practice of parallel programming
Code motion for explicitly parallel programs
Proceedings of the seventh ACM SIGPLAN symposium on Principles and practice of parallel programming
Hiding Relaxed Memory Consistency with a Compiler
IEEE Transactions on Computers - Special issue on the parallel architecture and compilation techniques conference
Pointer analysis for structured parallel programs
ACM Transactions on Programming Languages and Systems (TOPLAS)
Using cache optimizing compiler for managing software cache on distributed shared memory system
HPC-ASIA '97 Proceedings of the High-Performance Computing on the Information Superhighway, HPC-Asia '97
Detailed cache coherence characterization for OpenMP benchmarks
Proceedings of the 18th annual international conference on Supercomputing
A hybrid hardware/software approach to efficiently determine cache coherence Bottlenecks
Proceedings of the 19th annual international conference on Supercomputing
Lightweight lock-free synchronization methods for multithreading
Proceedings of the 20th annual international conference on Supercomputing
Interprocedural slicing of multithreaded programs with applications to Java
ACM Transactions on Programming Languages and Systems (TOPLAS)
Analysis of cache-coherence bottlenecks with hybrid hardware/software techniques
ACM Transactions on Architecture and Code Optimization (TACO)
Reordering constraints for pthread-style locks
Proceedings of the 12th ACM SIGPLAN symposium on Principles and practice of parallel programming
Compiler optimization techniques for OpenMP programs
Scientific Programming
Source-Code-Correlated Cache Coherence Characterization of OpenMP Benchmarks
IEEE Transactions on Parallel and Distributed Systems
Implications of application usage characteristics for collective communication offload
International Journal of High Performance Computing and Networking
Techniques for efficient placement of synchronization primitives
Proceedings of the 14th ACM SIGPLAN symposium on Principles and practice of parallel programming
Exploiting global optimizations for openmp programs in the openuh compiler
Proceedings of the 14th ACM SIGPLAN symposium on Principles and practice of parallel programming
Busy-wait barrier synchronization using distributed counters with local sensor
WOMPAT'03 Proceedings of the OpenMP applications and tools 2003 international conference on OpenMP shared memory parallel programming
Efficient sequential consistency using conditional fences
Proceedings of the 19th international conference on Parallel architectures and compilation techniques
Hi-index | 0.00 |
We present compiler analyses and optimizations for explicitly parallel programs that communicate through a shared address space. Any type of code motion on explicitly parallel programs requires a new kind of analysis to ensure that operations reordered on one processor cannot be observed by another. The analysis, based on work by Shasha and Snir, checks for cycles among interfering accesses. We improve the accuracy of their analysis by using additional information from post-wait synchronization, barriers, and locks.We demonstrate the use of this analysis by optimizing remote access on distributed memory machines. The optimizations include message pipelining, to allow multiple outstanding remote memory operations, conversion of two-way to one-way communication, and elimination of communication through data re-use. The performance improvements are as high as 20-35% for programs running on a CM-5 multiprocessor using the Split-C language as a global address layer.