Efficient and correct execution of parallel programs that share memory
ACM Transactions on Programming Languages and Systems (TOPLAS)
Local adaptive mesh refinement for shock hydrodynamics
Journal of Computational Physics
Concurrency analysis in the presence of procedures using a data-flow framework
TAV4 Proceedings of the symposium on Testing, analysis, and verification
The NAS parallel benchmarks—summary and preliminary results
Proceedings of the 1991 ACM/IEEE conference on Supercomputing
What are race conditions?: Some issues and formalizations
ACM Letters on Programming Languages and Systems (LOPLAS)
PPOPP '93 Proceedings of the fourth ACM SIGPLAN symposium on Principles and practice of parallel programming
Experiences with compiler-directed storage reclamation
FPCA '93 Proceedings of the conference on Functional programming languages and computer architecture
Analyses and optimizations for shared address space programs
Journal of Parallel and Distributed Computing - Special issue on compilation techniques for distributed memory systems
Program analysis via graph reachability
ILPS '97 Proceedings of the 1997 international symposium on Logic programming
Communication optimizations for parallel C programs
PLDI '98 Proceedings of the ACM SIGPLAN 1998 conference on Programming language design and implementation
Pointer analysis for multithreaded programs
Proceedings of the ACM SIGPLAN 1999 conference on Programming language design and implementation
JAVA '99 Proceedings of the ACM 1999 conference on Java Grande
Memory consistency and event ordering in scalable shared-memory multiprocessors
ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture
Static Analysis of Barrier Synchronization in Explicitly Parallel Programs
PACT '94 Proceedings of the IFIP WG10.3 Working Conference on Parallel Architectures and Compilation Techniques
Hiding Relaxed Memory Consistency with Compilers
PACT '00 Proceedings of the 2000 International Conference on Parallel Architectures and Compilation Techniques
Titanium Language Reference Manual
Titanium Language Reference Manual
GASNet Specification, v1.1
A finite-difference domain decomposition method using local corrections for the solution of poisson's equation
Automatic Support for Irregular Computations in a High-Level Language
IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Papers - Volume 01
Adaptive Mesh Refinement in Titanium
IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Papers - Volume 01
Compiler techniques for high performance sequentially consistent java programs
Proceedings of the tenth ACM SIGPLAN symposium on Principles and practice of parallel programming
A linear-time algorithm for optimal barrier placement
Proceedings of the tenth ACM SIGPLAN symposium on Principles and practice of parallel programming
Type systems for distributed data sharing
SAS'03 Proceedings of the 10th international conference on Static analysis
Automatic implementation of programming language consistency models
LCPC'02 Proceedings of the 15th international conference on Languages and Compilers for Parallel Computing
Automatic nonblocking communication for partitioned global address space programs
Proceedings of the 21st annual international conference on Supercomputing
Productivity and performance using partitioned global address space languages
Proceedings of the 2007 international workshop on Parallel symbolic computation
Parallel Languages and Compilers: Perspective From the Titanium Experience
International Journal of High Performance Computing Applications
Foundations of the C++ concurrency memory model
Proceedings of the 2008 ACM SIGPLAN conference on Programming language design and implementation
Rerun: Exploiting Episodes for Lightweight Memory Race Recording
ISCA '08 Proceedings of the 35th Annual International Symposium on Computer Architecture
Proceedings of the 7th annual IEEE/ACM International Symposium on Code Generation and Optimization
Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture
DRFX: a simple and efficient memory model for concurrent programming languages
PLDI '10 Proceedings of the 2010 ACM SIGPLAN conference on Programming language design and implementation
Efficient sequential consistency using conditional fences
Proceedings of the 19th international conference on Parallel architectures and compilation techniques
Efficient processor support for DRFx, a memory model with exceptions
Proceedings of the sixteenth international conference on Architectural support for programming languages and operating systems
A case for an SC-preserving compiler
Proceedings of the 32nd ACM SIGPLAN conference on Programming language design and implementation
Concurrency analysis for parallel programs with textually aligned barriers
LCPC'05 Proceedings of the 18th international conference on Languages and Compilers for Parallel Computing
Enforcing textual alignment of collectives using dynamic checks
LCPC'09 Proceedings of the 22nd international conference on Languages and Compilers for Parallel Computing
Generative operational semantics for relaxed memory models
ESOP'10 Proceedings of the 19th European conference on Programming Languages and Systems
End-to-end sequential consistency
Proceedings of the 39th Annual International Symposium on Computer Architecture
Hierarchical pointer analysis for distributed programs
SAS'07 Proceedings of the 14th international conference on Static Analysis
Hi-index | 0.01 |
The memory consistency model in shared memory parallel programming controls the order in which memory operations performed by one thread may be observed by another. The most natural model for programmers is to have memory accesses appear to take effect in the order specified in the original program. Language designers have been reluctant to use this strong semantics, called sequential consistency, due to concerns over the performance of memory fence instructions and related mechanisms that guarantee order. In this paper, we provide evidence for the practicality of sequential consistency by showing that advanced compiler analysis techniques are sufficient to eliminate the need for most memory fences and enable high-level optimizations. Our analyses eliminated over 97% of the memory fences that were needed by a na篓ýve implementation, accounting for 87 to 100% of the dynamically encountered fences in all but one benchmark. The impact of the memory model and analysis on runtime performance depends on the quality of the optimizations: more aggressive optimizations are likely to be invalidated by a strong memory consistency semantics. We consider two specific optimizations pipelining of bulk memory copies and communication aggregation and scheduling for irregular accesses and show that our most aggressive analysis is able to obtain the same performance as the relaxed model when applied to two linear algebra kernels. While additional work on parallel optimizations and analyses is needed, we believe these results provide important evidence on the viability of using a simple memory consistency model without sacrificing performance.