Making Sequential Consistency Practical in Titanium

Authors:
Amir Kamil;Jimmy Su;Katherine Yelick
Affiliations:
University of California, Berkeley;University of California, Berkeley;University of California, Berkeley
Venue:
SC '05 Proceedings of the 2005 ACM/IEEE conference on Supercomputing
Year:
2005

Citing 24
Cited 16

Efficient and correct execution of parallel programs that share memory

ACM Transactions on Programming Languages and Systems (TOPLAS)
Local adaptive mesh refinement for shock hydrodynamics

Journal of Computational Physics
Concurrency analysis in the presence of procedures using a data-flow framework

TAV4 Proceedings of the symposium on Testing, analysis, and verification
The NAS parallel benchmarks—summary and preliminary results

Proceedings of the 1991 ACM/IEEE conference on Supercomputing
What are race conditions?: Some issues and formalizations

ACM Letters on Programming Languages and Systems (LOPLAS)
Non-concurrency analysis

PPOPP '93 Proceedings of the fourth ACM SIGPLAN symposium on Principles and practice of parallel programming
Experiences with compiler-directed storage reclamation

FPCA '93 Proceedings of the conference on Functional programming languages and computer architecture
Analyses and optimizations for shared address space programs

Journal of Parallel and Distributed Computing - Special issue on compilation techniques for distributed memory systems
Program analysis via graph reachability

ILPS '97 Proceedings of the 1997 international symposium on Logic programming
Communication optimizations for parallel C programs

PLDI '98 Proceedings of the ACM SIGPLAN 1998 conference on Programming language design and implementation
Pointer analysis for multithreaded programs

Proceedings of the ACM SIGPLAN 1999 conference on Programming language design and implementation
Fixing the Java memory model

JAVA '99 Proceedings of the ACM 1999 conference on Java Grande
Memory consistency and event ordering in scalable shared-memory multiprocessors

ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture
Static Analysis of Barrier Synchronization in Explicitly Parallel Programs

PACT '94 Proceedings of the IFIP WG10.3 Working Conference on Parallel Architectures and Compilation Techniques
Hiding Relaxed Memory Consistency with Compilers

PACT '00 Proceedings of the 2000 International Conference on Parallel Architectures and Compilation Techniques
Titanium Language Reference Manual

Titanium Language Reference Manual
GASNet Specification, v1.1

GASNet Specification, v1.1
A finite-difference domain decomposition method using local corrections for the solution of poisson's equation

A finite-difference domain decomposition method using local corrections for the solution of poisson's equation
Automatic Support for Irregular Computations in a High-Level Language

IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Papers - Volume 01
Adaptive Mesh Refinement in Titanium

IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Papers - Volume 01
Compiler techniques for high performance sequentially consistent java programs

Proceedings of the tenth ACM SIGPLAN symposium on Principles and practice of parallel programming
A linear-time algorithm for optimal barrier placement

Proceedings of the tenth ACM SIGPLAN symposium on Principles and practice of parallel programming
Type systems for distributed data sharing

SAS'03 Proceedings of the 10th international conference on Static analysis
Automatic implementation of programming language consistency models

LCPC'02 Proceedings of the 15th international conference on Languages and Compilers for Parallel Computing

Automatic nonblocking communication for partitioned global address space programs

Proceedings of the 21st annual international conference on Supercomputing
Productivity and performance using partitioned global address space languages

Proceedings of the 2007 international workshop on Parallel symbolic computation
Parallel Languages and Compilers: Perspective From the Titanium Experience

International Journal of High Performance Computing Applications
Foundations of the C++ concurrency memory model

Proceedings of the 2008 ACM SIGPLAN conference on Programming language design and implementation
Rerun: Exploiting Episodes for Lightweight Memory Race Recording

ISCA '08 Proceedings of the 35th Annual International Symposium on Computer Architecture
Detecting and Eliminating Potential Violations of Sequential Consistency for Concurrent C/C++ Programs

Proceedings of the 7th annual IEEE/ACM International Symposium on Code Generation and Optimization
BulkCompiler: high-performance sequential consistency through cooperative compiler and hardware support

Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture
DRFX: a simple and efficient memory model for concurrent programming languages

PLDI '10 Proceedings of the 2010 ACM SIGPLAN conference on Programming language design and implementation
Efficient sequential consistency using conditional fences

Proceedings of the 19th international conference on Parallel architectures and compilation techniques
Efficient processor support for DRFx, a memory model with exceptions

Proceedings of the sixteenth international conference on Architectural support for programming languages and operating systems
A case for an SC-preserving compiler

Proceedings of the 32nd ACM SIGPLAN conference on Programming language design and implementation
Concurrency analysis for parallel programs with textually aligned barriers

LCPC'05 Proceedings of the 18th international conference on Languages and Compilers for Parallel Computing
Enforcing textual alignment of collectives using dynamic checks

LCPC'09 Proceedings of the 22nd international conference on Languages and Compilers for Parallel Computing
Generative operational semantics for relaxed memory models

ESOP'10 Proceedings of the 19th European conference on Programming Languages and Systems
End-to-end sequential consistency

Proceedings of the 39th Annual International Symposium on Computer Architecture
Hierarchical pointer analysis for distributed programs

SAS'07 Proceedings of the 14th international conference on Static Analysis

Quantified Score

Hi-index	0.01

Visualization

Abstract

The memory consistency model in shared memory parallel programming controls the order in which memory operations performed by one thread may be observed by another. The most natural model for programmers is to have memory accesses appear to take effect in the order specified in the original program. Language designers have been reluctant to use this strong semantics, called sequential consistency, due to concerns over the performance of memory fence instructions and related mechanisms that guarantee order. In this paper, we provide evidence for the practicality of sequential consistency by showing that advanced compiler analysis techniques are sufficient to eliminate the need for most memory fences and enable high-level optimizations. Our analyses eliminated over 97% of the memory fences that were needed by a na篓ýve implementation, accounting for 87 to 100% of the dynamically encountered fences in all but one benchmark. The impact of the memory model and analysis on runtime performance depends on the quality of the optimizations: more aggressive optimizations are likely to be invalidated by a strong memory consistency semantics. We consider two specific optimizations pipelining of bulk memory copies and communication aggregation and scheduling for irregular accesses and show that our most aggressive analysis is able to obtain the same performance as the relaxed model when applied to two linear algebra kernels. While additional work on parallel optimizations and analyses is needed, we believe these results provide important evidence on the viability of using a simple memory consistency model without sacrificing performance.