A case for an SC-preserving compiler

Authors:
Daniel Marino;Abhayendra Singh;Todd Millstein;Madanlal Musuvathi;Satish Narayanasamy
Affiliations:
University of California, Los Angeles, Los Angeles, CA, USA;University of Michigan, Ann Arbor, Ann Arbor, MI, USA;University of California, Los Angeles, Los Angeles, CA, USA;Microsoft Research, Redmond, Redmond, WA, USA;University of Michigan, Ann Arbor, Ann Arbor, MI, USA
Venue:
Proceedings of the 32nd ACM SIGPLAN conference on Programming language design and implementation
Year:
2011

Citing 46
Cited 11

Efficient and correct execution of parallel programs that share memory

ACM Transactions on Programming Languages and Systems (TOPLAS)
Detecting data races on weak memory systems

ISCA '91 Proceedings of the 18th annual international symposium on Computer architecture
Transactional memory: architectural support for lock-free data structures

ISCA '93 Proceedings of the 20th annual international symposium on computer architecture
Dynamic memory disambiguation using the memory conflict buffer

ASPLOS VI Proceedings of the sixth international conference on Architectural support for programming languages and operating systems
The SPLASH-2 programs: characterization and methodological considerations

ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Using speculative retirement and larger instruction windows to narrow the performance gap between memory consistency models

Proceedings of the ninth annual ACM symposium on Parallel algorithms and architectures
Weak ordering—a new definition

ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture
Type-based race detection for Java

PLDI '00 Proceedings of the ACM SIGPLAN 2000 conference on Programming language design and implementation
The store-load address table and speculative register promotion

Proceedings of the 33rd annual ACM/IEEE international symposium on Microarchitecture
A parameterized type system for race-free Java programs

OOPSLA '01 Proceedings of the 16th ACM SIGPLAN conference on Object-oriented programming, systems, languages, and applications
Ownership types for safe programming: preventing data races and deadlocks

OOPSLA '02 Proceedings of the 17th ACM SIGPLAN conference on Object-oriented programming, systems, languages, and applications
Shared Memory Consistency Models: A Tutorial

Computer
Multiprocessors Should Support Simple Memory-Consistency Models

Computer
Simics: A Full System Simulation Platform

Computer
The MIPS R10000 Superscalar Microprocessor

IEEE Micro
Speculative Sequential Consistency with Little Custom Storage

Proceedings of the 2002 International Conference on Parallel Architectures and Compilation Techniques
ReEnact: using thread-level speculation mechanisms to debug data races in multithreaded codes

Proceedings of the 30th annual international symposium on Computer architecture
LLVM: A Compilation Framework for Lifelong Program Analysis & Transformation

Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
Transactional Memory Coherence and Consistency

Proceedings of the 31st annual international symposium on Computer architecture
The Java memory model

Proceedings of the 32nd ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Compiler techniques for high performance sequentially consistent java programs

Proceedings of the tenth ACM SIGPLAN symposium on Principles and practice of parallel programming
Making Sequential Consistency Practical in Titanium

SC '05 Proceedings of the 2005 ACM/IEEE conference on Supercomputing
LOCKSMITH: context-sensitive correlation analysis for race detection

Proceedings of the 2006 ACM SIGPLAN conference on Programming language design and implementation
Unbounded page-based transactional memory

Proceedings of the 12th international conference on Architectural support for programming languages and operating systems
SPEC CPU2006 benchmark descriptions

ACM SIGARCH Computer Architecture News
BulkSC: bulk enforcement of sequential consistency

Proceedings of the 34th annual international symposium on Computer architecture
Goldilocks: a race and transaction-aware java runtime

Proceedings of the 2007 ACM SIGPLAN conference on Programming language design and implementation
How to Make a Multiprocessor Computer That Correctly Executes Multiprocess Programs

IEEE Transactions on Computers
Foundations of the C++ concurrency memory model

Proceedings of the 2008 ACM SIGPLAN conference on Programming language design and implementation
On Validity of Program Transformations in the Java Memory Model

ECOOP '08 Proceedings of the 22nd European conference on Object-Oriented Programming
The PARSEC benchmark suite: characterization and architectural implications

Proceedings of the 17th international conference on Parallel architectures and compilation techniques
FastTrack: efficient and precise dynamic race detection

Proceedings of the 2009 ACM SIGPLAN conference on Programming language design and implementation
InvisiFence: performance-transparent memory ordering in conventional multiprocessors

Proceedings of the 36th annual international symposium on Computer architecture
SigRace: signature-based data race detection

Proceedings of the 36th annual international symposium on Computer architecture
A Better x86 Memory Model: x86-TSO

TPHOLs '09 Proceedings of the 22nd International Conference on Theorem Proving in Higher Order Logics
A type and effect system for deterministic parallel Java

Proceedings of the 24th ACM SIGPLAN conference on Object oriented programming systems languages and applications
BulkCompiler: high-performance sequential consistency through cooperative compiler and hardware support

Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture
Memory models: a case for rethinking parallel languages and hardware

Communications of the ACM
DRFX: a simple and efficient memory model for concurrent programming languages

PLDI '10 Proceedings of the 2010 ACM SIGPLAN conference on Programming language design and implementation
Conflict exceptions: simplifying concurrent language semantics with precise hardware exceptions for data-races

Proceedings of the 37th annual international symposium on Computer architecture
Efficient sequential consistency using conditional fences

Proceedings of the 19th international conference on Parallel architectures and compilation techniques
A case for system support for concurrency exceptions

HotPar'09 Proceedings of the First USENIX conference on Hot topics in parallelism
Relaxed-memory concurrency and verified compilation

Proceedings of the 38th annual ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Mathematizing C++ concurrency

Proceedings of the 38th annual ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Speculative optimizations for parallel programs on multicores

LCPC'09 Proceedings of the 22nd international conference on Languages and Compilers for Parallel Computing
Verifying local transformations on relaxed memory models

CC'10/ETAPS'10 Proceedings of the 19th joint European conference on Theory and Practice of Software, international conference on Compiler Construction

There is nothing wrong with out-of-thin-air: compiler optimization and memory models

Proceedings of the 2011 ACM SIGPLAN Workshop on Memory Systems Performance and Correctness
Efficient sequential consistency via conflict ordering

ASPLOS XVII Proceedings of the seventeenth international conference on Architectural Support for Programming Languages and Operating Systems
Can seqlocks get along with programming language memory models?

Proceedings of the 2012 ACM SIGPLAN Workshop on Memory Systems Performance and Correctness
End-to-end sequential consistency

Proceedings of the 39th Annual International Symposium on Computer Architecture
Plan B: a buffered memory model for Java

POPL '13 Proceedings of the 40th annual ACM SIGPLAN-SIGACT symposium on Principles of programming languages
TSO_ATOMICITY: efficient hardware primitive for TSO-preserving region optimizations

Proceedings of the eighteenth international conference on Architectural support for programming languages and operating systems
Safety-first approach to memory consistency models

Proceedings of the 2013 international symposium on memory management
WeeFence: toward making fences free in TSO

Proceedings of the 40th Annual International Symposium on Computer Architecture
CompCertTSO: A Verified Compiler for Relaxed-Memory Concurrency

Journal of the ACM (JACM)
CDSchecker: checking concurrent data structures written with C/C++ atomics

Proceedings of the 2013 ACM SIGPLAN international conference on Object oriented programming systems languages & applications
Interprocedural strength reduction of critical sections in explicitly-parallel programs

PACT '13 Proceedings of the 22nd international conference on Parallel architectures and compilation techniques

Quantified Score

Hi-index	0.00

Visualization

Abstract

The most intuitive memory consistency model for shared-memory multi-threaded programming is sequential consistency (SC). However, current concurrent programming languages support a relaxed model, as such relaxations are deemed necessary for enabling important optimizations. This paper demonstrates that an SC-preserving compiler, one that ensures that every SC behavior of a compiler-generated binary is an SC behavior of the source program, retains most of the performance benefits of an optimizing compiler. The key observation is that a large class of optimizations crucial for performance are either already SC-preserving or can be modified to preserve SC while retaining much of their effectiveness. An SC-preserving compiler, obtained by restricting the optimization phases in LLVM, a state-of-the-art C/C++ compiler, incurs an average slowdown of 3.8% and a maximum slowdown of 34% on a set of 30 programs from the SPLASH-2, PARSEC, and SPEC CINT2006 benchmark suites. While the performance overhead of preserving SC in the compiler is much less than previously assumed, it might still be unacceptable for certain applications. We believe there are several avenues for improving performance without giving up SC-preservation. In this vein, we observe that the overhead of our SC-preserving compiler arises mainly from its inability to aggressively perform a class of optimizations we identify as eager-load optimizations. This class includes common-subexpression elimination, constant propagation, global value numbering, and common cases of loop-invariant code motion. We propose a notion of interference checks in order to enable eager-load optimizations while preserving SC. Interference checks expose to the compiler a commonly used hardware speculation mechanism that can efficiently detect whether a particular variable has changed its value since last read.