Efficient processor support for DRFx, a memory model with exceptions

Authors:
Abhayendra Singh;Daniel Marino;Satish Narayanasamy;Todd Millstein;Madan Musuvathi
Affiliations:
University of Michigan, Ann Arbor, MI, USA;University of California, Los Angeles, CA, USA;University of Michigan, Ann Arbor, MI, USA;University of California, Los Angeles, CA, USA;Microsoft Research, Redmond, WA, USA
Venue:
Proceedings of the sixteenth international conference on Architectural support for programming languages and operating systems
Year:
2011

Citing 38
Cited 8

Efficient and correct execution of parallel programs that share memory

ACM Transactions on Programming Languages and Systems (TOPLAS)
Detecting violations of sequential consistency

SPAA '91 Proceedings of the third annual ACM symposium on Parallel algorithms and architectures
Detecting data races on weak memory systems

ISCA '91 Proceedings of the 18th annual international symposium on Computer architecture
Transactional memory: architectural support for lock-free data structures

ISCA '93 Proceedings of the 20th annual international symposium on computer architecture
The SPLASH-2 programs: characterization and methodological considerations

ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Analyses and optimizations for shared address space programs

Journal of Parallel and Distributed Computing - Special issue on compilation techniques for distributed memory systems
Using speculative retirement and larger instruction windows to narrow the performance gap between memory consistency models

Proceedings of the ninth annual ACM symposium on Parallel algorithms and architectures
Weak ordering—a new definition

ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture
Type-based race detection for Java

PLDI '00 Proceedings of the ACM SIGPLAN 2000 conference on Programming language design and implementation
Time, clocks, and the ordering of events in a distributed system

Communications of the ACM
A parameterized type system for race-free Java programs

OOPSLA '01 Proceedings of the 16th ACM SIGPLAN conference on Object-oriented programming, systems, languages, and applications
Ownership types for safe programming: preventing data races and deadlocks

OOPSLA '02 Proceedings of the 17th ACM SIGPLAN conference on Object-oriented programming, systems, languages, and applications
Shared Memory Consistency Models: A Tutorial

Computer
Simics: A Full System Simulation Platform

Computer
ReEnact: using thread-level speculation mechanisms to debug data races in multithreaded codes

Proceedings of the 30th annual international symposium on Computer architecture
LLVM: A Compilation Framework for Lifelong Program Analysis & Transformation

Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
Transactional Memory Coherence and Consistency

Proceedings of the 31st annual international symposium on Computer architecture
Programming with transactional coherence and consistency (TCC)

ASPLOS XI Proceedings of the 11th international conference on Architectural support for programming languages and operating systems
The Java memory model

Proceedings of the 32nd ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Compiler techniques for high performance sequentially consistent java programs

Proceedings of the tenth ACM SIGPLAN symposium on Principles and practice of parallel programming
Making Sequential Consistency Practical in Titanium

SC '05 Proceedings of the 2005 ACM/IEEE conference on Supercomputing
Bulk Disambiguation of Speculative Threads in Multiprocessors

Proceedings of the 33rd annual international symposium on Computer Architecture
Mechanisms for store-wait-free multiprocessors

Proceedings of the 34th annual international symposium on Computer architecture
BulkSC: bulk enforcement of sequential consistency

Proceedings of the 34th annual international symposium on Computer architecture
Goldilocks: a race and transaction-aware java runtime

Proceedings of the 2007 ACM SIGPLAN conference on Programming language design and implementation
How to Make a Multiprocessor Computer That Correctly Executes Multiprocess Programs

IEEE Transactions on Computers
Foundations of the C++ concurrency memory model

Proceedings of the 2008 ACM SIGPLAN conference on Programming language design and implementation
On Validity of Program Transformations in the Java Memory Model

ECOOP '08 Proceedings of the 22nd European conference on Object-Oriented Programming
The PARSEC benchmark suite: characterization and architectural implications

Proceedings of the 17th international conference on Parallel architectures and compilation techniques
FastTrack: efficient and precise dynamic race detection

Proceedings of the 2009 ACM SIGPLAN conference on Programming language design and implementation
InvisiFence: performance-transparent memory ordering in conventional multiprocessors

Proceedings of the 36th annual international symposium on Computer architecture
SigRace: signature-based data race detection

Proceedings of the 36th annual international symposium on Computer architecture
A type and effect system for deterministic parallel Java

Proceedings of the 24th ACM SIGPLAN conference on Object oriented programming systems languages and applications
BulkCompiler: high-performance sequential consistency through cooperative compiler and hardware support

Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture
The java memory model: operationally, denotationally, axiomatically

ESOP'07 Proceedings of the 16th European conference on Programming
DRFX: a simple and efficient memory model for concurrent programming languages

PLDI '10 Proceedings of the 2010 ACM SIGPLAN conference on Programming language design and implementation
Conflict exceptions: simplifying concurrent language semantics with precise hardware exceptions for data-races

Proceedings of the 37th annual international symposium on Computer architecture
A case for system support for concurrency exceptions

HotPar'09 Proceedings of the First USENIX conference on Hot topics in parallelism

Data-race exceptions have benefits beyond the memory model

Proceedings of the 2011 ACM SIGPLAN Workshop on Memory Systems Performance and Correctness
Efficient sequential consistency via conflict ordering

ASPLOS XVII Proceedings of the seventeenth international conference on Architectural Support for Programming Languages and Operating Systems
End-to-end sequential consistency

Proceedings of the 39th Annual International Symposium on Computer Architecture
Static detection of resource contention problems in server-side scripts

Proceedings of the 34th International Conference on Software Engineering
Permission regions for race-free parallelism

RV'11 Proceedings of the Second international conference on Runtime verification
Practical permissions for race-free parallelism

ECOOP'12 Proceedings of the 26th European conference on Object-Oriented Programming
Address-aware fences

Proceedings of the 27th international ACM conference on International conference on supercomputing
Low-level detection of language-level data races with LARD

Proceedings of the 19th international conference on Architectural support for programming languages and operating systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

A longstanding challenge of shared-memory concurrency is to provide a memory model that allows for efficient implementation while providing strong and simple guarantees to programmers. The C++0x and Java memory models admit a wide variety of compiler and hardwareoptimizations and provide sequentially consistent (SC) semantics for data-race-free programs. However, they either do not provide any semantics (C++0x) or provide a hard-to-understand semantics (Java) for racy programs, compromising the safety and debuggability of such programs. In earlier work we proposed the DRFx memory model, which addresses this problem by dynamically detecting potential violations of SC due to the interaction of compiler or hardware optimizations with data races and halting execution upon detection. In this paper, we present a detailed micro-architecture design for supporting the DRFx memory model, formalize the design and prove its correctness, and evaluate the design using a hardware simulator. We describe a set of DRFx-compliant complexity-effective optimizations which allow us to attain performance close to that of TSO (Total Store Model) and DRF0 while providing strong guarantees for all programs.