Efficient and correct execution of parallel programs that share memory

Authors:
Dennis Shasha;Marc Snir
Affiliations:
Courant Institute, New York Univ., New York, NY;IBM T. J. Watson Research Center, Yorktown Heights, NY
Venue:
ACM Transactions on Programming Languages and Systems (TOPLAS)
Year:
1988

Citing 7
Cited 103

Compilers: principles, techniques, and tools

Compilers: principles, techniques, and tools
Advanced compiler optimizations for supercomputers

Communications of the ACM - Special issue on parallelism
Concurrency control in a system for distributed databases (SDD-1)

ACM Transactions on Database Systems (TODS)
Issues related to MIMD shared-memory computers: the NYU ultracomputer approach

ISCA '85 Proceedings of the 12th annual international symposium on Computer architecture
Concurrency Control in Distributed Database Systems

ACM Computing Surveys (CSUR)
Time, clocks, and the ordering of events in a distributed system

Communications of the ACM
An optimality theory of concurrency control for databases

SIGMOD '79 Proceedings of the 1979 ACM SIGMOD international conference on Management of data

Efficient synchronization of multiprocessors with shared memory

ACM Transactions on Programming Languages and Systems (TOPLAS)
A model for concurrency in nested transactions systems

Journal of the ACM (JACM)
The APRAM: incorporating asynchrony into the PRAM model

SPAA '89 Proceedings of the first annual ACM symposium on Parallel algorithms and architectures
Proving sequential consistency of high-performance shared memories (extended abstract)

SPAA '91 Proceedings of the third annual ACM symposium on Parallel algorithms and architectures
A correctness condition for high-performance multiprocessors (extended abstract)

STOC '92 Proceedings of the twenty-fourth annual ACM symposium on Theory of computing
Simple rational guidance for chopping up transactions

SIGMOD '92 Proceedings of the 1992 ACM SIGMOD international conference on Management of data
Compile-time analysis of parallel programs that share memory

POPL '92 Proceedings of the 19th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Lazy caching

ACM Transactions on Programming Languages and Systems (TOPLAS)
Programming models for irregular applications

ACM SIGPLAN Notices - Workshop on languages, compilers and run-time environments for distributed memory multiprocessors
Shared memory consistency conditions for non-sequential execution: definitions and programming strategies

SPAA '93 Proceedings of the fifth annual ACM symposium on Parallel algorithms and architectures
Data flow analysis for parallel programs

CSC '93 Proceedings of the 1993 ACM conference on Computer science
On testing cache-coherent shared memories

SPAA '94 Proceedings of the sixth annual ACM symposium on Parallel algorithms and architectures
Optimizing parallel programs with explicit synchronization

PLDI '95 Proceedings of the ACM SIGPLAN 1995 conference on Programming language design and implementation
Transaction chopping: algorithms and performance studies

ACM Transactions on Database Systems (TODS)
Verification techniques for cache coherence protocols

ACM Computing Surveys (CSUR)
Lamport clocks: verifying a directory cache-coherence protocol

Proceedings of the tenth annual ACM symposium on Parallel algorithms and architectures
Retrospective: weak ordering—a new definition

25 years of the international symposia on Computer architecture (selected papers)
Weak ordering—a new definition

25 years of the international symposia on Computer architecture (selected papers)
Memory consistency and event ordering in scalable shared-memory multiprocessors

25 years of the international symposia on Computer architecture (selected papers)
Basic compiler algorithms for parallel programs

Proceedings of the seventh ACM SIGPLAN symposium on Principles and practice of parallel programming
Code motion for explicitly parallel programs

Proceedings of the seventh ACM SIGPLAN symposium on Principles and practice of parallel programming
Compile-time detection of race conditions in a parallel program

ICS '89 Proceedings of the 3rd international conference on Supercomputing
Weak ordering—a new definition

ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture
Memory consistency and event ordering in scalable shared-memory multiprocessors

ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture
Hiding Relaxed Memory Consistency with a Compiler

IEEE Transactions on Computers - Special issue on the parallel architecture and compilation techniques conference
Pointer analysis for structured parallel programs

ACM Transactions on Programming Languages and Systems (TOPLAS)
Shared Memory Consistency Models: A Tutorial

Computer
Storage in the PowerPC

IEEE Micro
A Unified Formalization of Four Shared-Memory Models

IEEE Transactions on Parallel and Distributed Systems
Access Graphs: A Model for Investigating Memory Consistency

IEEE Transactions on Parallel and Distributed Systems
A Technique for the Distributed Simulation of Parallel Computers

MASCOTS '95 Proceedings of the 3rd International Workshop on Modeling, Analysis, and Simulation of Computer and Telecommunication Systems
Analysis of Multithreaded Programs

SAS '01 Proceedings of the 8th International Symposium on Static Analysis
Static conflict analysis for multi-threaded object-oriented programs

PLDI '03 Proceedings of the ACM SIGPLAN 2003 conference on Programming language design and implementation
Automatic fence insertion for shared memory multiprocessing

ICS '03 Proceedings of the 17th annual international conference on Supercomputing
Hardware Controlled Prefeching in Directory-Based Cache Coherent Systems

FRONTIERS '96 Proceedings of the 6th Symposium on the Frontiers of Massively Parallel Computation
ReEnact: using thread-level speculation mechanisms to debug data races in multithreaded codes

Proceedings of the 30th annual international symposium on Computer architecture
A "flight data recorder" for enabling full-system multiprocessor deterministic replay

Proceedings of the 30th annual international symposium on Computer architecture
Consistency and event ordering in the shared regions model

CASCON '93 Proceedings of the 1993 conference of the Centre for Advanced Studies on Collaborative research: distributed computing - Volume 2
The Java memory model

Proceedings of the 32nd ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Proving refinement using transduction

Distributed Computing - Special issue: Verification of lazy caching
Compiler techniques for high performance sequentially consistent java programs

Proceedings of the tenth ACM SIGPLAN symposium on Principles and practice of parallel programming
A Framework for Formalization and Strictness Analysis of Simulation Event Orderings

Simulation
Design Space Exploration of a Software Speculative Parallelization Scheme

IEEE Transactions on Parallel and Distributed Systems
Communication Optimizations for Fine-Grained UPC Applications

Proceedings of the 14th International Conference on Parallel Architectures and Compilation Techniques
Making Sequential Consistency Practical in Titanium

SC '05 Proceedings of the 2005 ACM/IEEE conference on Supercomputing
On the correctness of program execution when cache coherence is maintained locally at data-sharing boundaries in distributed shared memory multiprocessors

International Journal of Parallel Programming
Memory Model = Instruction Reordering + Store Atomicity

Proceedings of the 33rd annual international symposium on Computer Architecture
Conditional Memory Ordering

Proceedings of the 33rd annual international symposium on Computer Architecture
A two-phase escape analysis for parallel java programs

Proceedings of the 15th international conference on Parallel architectures and compilation techniques
Lightweight lock-free synchronization methods for multithreading

Proceedings of the 20th annual international conference on Supercomputing
A theory of memory models

Proceedings of the 12th ACM SIGPLAN symposium on Principles and practice of parallel programming
Reordering constraints for pthread-style locks

Proceedings of the 12th ACM SIGPLAN symposium on Principles and practice of parallel programming
CheckFence: checking consistency of concurrent data types on relaxed memory models

Proceedings of the 2007 ACM SIGPLAN conference on Programming language design and implementation
Practical escape analyses: how good are they?

Proceedings of the 3rd international conference on Virtual execution environments
A java compiler for many memory models - extended abstract

JVM'01 Proceedings of the 2001 Symposium on JavaTM Virtual Machine Research and Technology Symposium - Volume 1
Efficient Classloading Strategies for Interprocedural Analyses in the Presence of Dynamic Classloading

WODA '07 Proceedings of the 5th International Workshop on Dynamic Analysis
Problems with using MPI 1.1 and 2.0 as compilation targets for parallel language implementations

International Journal of High Performance Computing and Networking
Foundations of the C++ concurrency memory model

Proceedings of the 2008 ACM SIGPLAN conference on Programming language design and implementation
Effective Program Verification for Relaxed Memory Models

CAV '08 Proceedings of the 20th international conference on Computer Aided Verification
A Framework for Proving Correctness of Adjoint Message-Passing Programs

Proceedings of the 15th European PVM/MPI Users' Group Meeting on Recent Advances in Parallel Virtual Machine and Message Passing Interface
Detecting and Eliminating Potential Violations of Sequential Consistency for Concurrent C/C++ Programs

Proceedings of the 7th annual IEEE/ACM International Symposium on Code Generation and Optimization
BulkCompiler: high-performance sequential consistency through cooperative compiler and hardware support

Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture
Light64: lightweight hardware support for data race detection during systematic testing of parallel programs

Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture
Memory models: a case for rethinking parallel languages and hardware

Communications of the ACM
Incorporation of OpenMP memory consistency into conventional dataflow analysis

IWOMP'08 Proceedings of the 4th international conference on OpenMP in a new era of parallelism
DRFX: a simple and efficient memory model for concurrent programming languages

PLDI '10 Proceedings of the 2010 ACM SIGPLAN conference on Programming language design and implementation
Conflict exceptions: simplifying concurrent language semantics with precise hardware exceptions for data-races

Proceedings of the 37th annual international symposium on Computer architecture
Efficient sequential consistency using conditional fences

Proceedings of the 19th international conference on Parallel architectures and compilation techniques
Automatic atomic region identification in shared memory SPMD programs

Proceedings of the ACM international conference on Object oriented programming systems languages and applications
A formal approach to replica consistency in directory service

PPAM'09 Proceedings of the 8th international conference on Parallel processing and applied mathematics: Part I
Reasoning about the implementation of concurrency abstractions on x86-TSO

ECOOP'10 Proceedings of the 24th European conference on Object-oriented programming
A technique for the effective and automatic reuse of classical compiler optimizations on multithreaded code

Proceedings of the 38th annual ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Efficient processor support for DRFx, a memory model with exceptions

Proceedings of the sixteenth international conference on Architectural support for programming languages and operating systems
Sound and complete monitoring of sequential consistency for relaxed memory models

TACAS'11/ETAPS'11 Proceedings of the 17th international conference on Tools and algorithms for the construction and analysis of systems: part of the joint European conferences on theory and practice of software
Partial-coherence abstractions for relaxed memory models

Proceedings of the 32nd ACM SIGPLAN conference on Programming language design and implementation
A case for an SC-preserving compiler

Proceedings of the 32nd ACM SIGPLAN conference on Programming language design and implementation
Safe optimisations for shared-memory concurrent programs

Proceedings of the 32nd ACM SIGPLAN conference on Programming language design and implementation
Automatic inference of memory fences

Proceedings of the 2010 Conference on Formal Methods in Computer-Aided Design
Deciding robustness against total store ordering

ICALP'11 Proceedings of the 38th international conference on Automata, languages and programming - Volume Part II
Stability in weak memory models

CAV'11 Proceedings of the 23rd international conference on Computer aided verification
Verifying fence elimination optimisations

SAS'11 Proceedings of the 18th international conference on Static analysis
Efficient computation of may-happen-in-parallel information for concurrent java programs

LCPC'05 Proceedings of the 18th international conference on Languages and Compilers for Parallel Computing
Evaluating the impact of thread escape analysis on a memory consistency model-aware compiler

LCPC'05 Proceedings of the 18th international conference on Languages and Compilers for Parallel Computing
Clarifying and compiling C/C++ concurrency: from C++11 to POWER

POPL '12 Proceedings of the 39th annual ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Bounded model checking of concurrent data types on relaxed memory models: a case study

CAV'06 Proceedings of the 18th international conference on Computer Aided Verification
Efficient computation of communicator variables for programs with unstructured parallelism

LCPC'04 Proceedings of the 17th international conference on Languages and Compilers for High Performance Computing
Fences in weak memory models

CAV'10 Proceedings of the 22nd international conference on Computer Aided Verification
Automatic implementation of programming language consistency models

LCPC'02 Proceedings of the 15th international conference on Languages and Compilers for Parallel Computing
GKLEE: concolic verification and test generation for GPUs

Proceedings of the 17th ACM SIGPLAN symposium on Principles and Practice of Parallel Programming
Efficient sequential consistency via conflict ordering

ASPLOS XVII Proceedings of the seventeenth international conference on Architectural Support for Programming Languages and Operating Systems
Fences in weak memory models (extended version)

Formal Methods in System Design
On a Technique for Transparently Empowering Classical Compiler Optimizations on Multithreaded Code

ACM Transactions on Programming Languages and Systems (TOPLAS)
Synchronising C/C++ and POWER

Proceedings of the 33rd ACM SIGPLAN conference on Programming Language Design and Implementation
Dynamic synthesis for relaxed memory models

Proceedings of the 33rd ACM SIGPLAN conference on Programming Language Design and Implementation
Automatic inference of memory fences

ACM SIGACT News
End-to-end sequential consistency

Proceedings of the 39th Annual International Symposium on Computer Architecture
Checking and enforcing robustness against TSO

ESOP'13 Proceedings of the 22nd European conference on Programming Languages and Systems
MEMORAX, a precise and sound tool for automatic fence insertion under TSO

TACAS'13 Proceedings of the 19th international conference on Tools and Algorithms for the Construction and Analysis of Systems
Vulcan: Hardware Support for Detecting Sequential Consistency Violations Dynamically

MICRO-45 Proceedings of the 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture
Address-aware fences

Proceedings of the 27th international ACM conference on International conference on supercomputing
WeeFence: toward making fences free in TSO

Proceedings of the 40th Annual International Symposium on Computer Architecture
CompCertTSO: A Verified Compiler for Relaxed-Memory Concurrency

Journal of the ACM (JACM)
Compiler testing via a theory of sound optimisations in the C11/C++11 memory model

Proceedings of the 34th ACM SIGPLAN conference on Programming language design and implementation

Quantified Score

Hi-index	0.02

Visualization

Abstract

In this paper we consider an optimization problem that arises in the execution of parallel programs on shared-memory multiple-instruction-stream, multiple-data-stream (MIMD) computers. A program on such machines consists of many sequential program segments, each executed by a single processor. These segments interact as they access shared variables. Access to memory is asynchronous, and memory accesses are not necessarily executed in the order they were issued. An execution is correct if it is sequentially consistent: It should seem as if all the instructions were executed sequentially, in an order obtained by interleaving the instruction streams of the processors. Sequential consistency can be enforced by delaying each access to shared memory until the previous access of the same processor has terminated. For performance reasons, however, we want to allow several accesses by the same processor to proceed concurrently. Our analysis finds a minimal set of delays that enforces sequential consistency. The analysis extends to interprocessor synchronization constraints and to code where blocks of operations have to execute atomically. We use a conflict graph similar to that used to schedule transactions in distributed databases. Our graph incorporates the order on operations given by the program text, enabling us to do without locks even when database conflict graphs would suggest that locks are necessary. Our work has implications for the design of multiprocessors; it offers new compiler optimization techniques for parallel languages that support shared variables.