BulkCompiler: high-performance sequential consistency through cooperative compiler and hardware support

Authors:
W. Ahn;S. Qi;M. Nicolaides;J. Torrellas;J.-W. Lee;X. Fang;S. Midkiff;David Wong
Affiliations:
University of Illinois at Urbana-Champaign;University of Illinois at Urbana-Champaign;University of Illinois at Urbana-Champaign;University of Illinois at Urbana-Champaign;Purdue University;Purdue University;Purdue University;Intel Corporation
Venue:
Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture
Year:
2009

Citing 25
Cited 15

Efficient and correct execution of parallel programs that share memory

ACM Transactions on Programming Languages and Systems (TOPLAS)
Analyses and optimizations for shared address space programs

Journal of Parallel and Distributed Computing - Special issue on compilation techniques for distributed memory systems
A conservative data flow algorithm for detecting all pairs of statements that may happen in parallel

SIGSOFT '98/FSE-6 Proceedings of the 6th ACM SIGSOFT international symposium on Foundations of software engineering
Is SC + ILP = RC?

ISCA '99 Proceedings of the 26th annual international symposium on Computer architecture
Speculative lock elision: enabling highly concurrent multithreaded execution

Proceedings of the 34th annual ACM/IEEE international symposium on Microarchitecture
The Transmeta Code Morphing™ Software: using speculation, recovery, and adaptive retranslation to address real-life challenges

Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
Static conflict analysis for multi-threaded object-oriented programs

PLDI '03 Proceedings of the ACM SIGPLAN 2003 conference on Programming language design and implementation
Automatic fence insertion for shared memory multiprocessing

ICS '03 Proceedings of the 17th annual international conference on Supercomputing
Memory Ordering: A Value-Based Approach

Proceedings of the 31st annual international symposium on Computer architecture
Transactional Memory Coherence and Consistency

Proceedings of the 31st annual international symposium on Computer architecture
Compiler techniques for high performance sequentially consistent java programs

Proceedings of the tenth ACM SIGPLAN symposium on Principles and practice of parallel programming
Making Sequential Consistency Practical in Titanium

SC '05 Proceedings of the 2005 ACM/IEEE conference on Supercomputing
A two-phase escape analysis for parallel java programs

Proceedings of the 15th international conference on Parallel architectures and compilation techniques
Eliminating synchronization-related atomic operations with biased locking and bulk rebiasing

Proceedings of the 21st annual ACM SIGPLAN conference on Object-oriented programming systems, languages, and applications
Hardware atomicity for reliable software speculation

Proceedings of the 34th annual international symposium on Computer architecture
Mechanisms for store-wait-free multiprocessors

Proceedings of the 34th annual international symposium on Computer architecture
BulkSC: bulk enforcement of sequential consistency

Proceedings of the 34th annual international symposium on Computer architecture
Iterative context bounding for systematic testing of multithreaded programs

Proceedings of the 2007 ACM SIGPLAN conference on Programming language design and implementation
Subtleties of Transactional Memory Atomicity Semantics

IEEE Computer Architecture Letters
The java hotspotTM server compiler

JVM'01 Proceedings of the 2001 Symposium on JavaTM Virtual Machine Research and Technology Symposium - Volume 1
TxLinux: using and managing hardware transactional memory in an operating system

Proceedings of twenty-first ACM SIGOPS symposium on Operating systems principles
How to Make a Multiprocessor Computer That Correctly Executes Multiprocess Programs

IEEE Transactions on Computers
InvisiFence: performance-transparent memory ordering in conventional multiprocessors

Proceedings of the 36th annual international symposium on Computer architecture
Simultaneous speculative threading: a novel pipeline architecture implemented in sun's rock processor

Proceedings of the 36th annual international symposium on Computer architecture
Type systems for distributed data sharing

SAS'03 Proceedings of the 10th international conference on Static analysis

The Bulk Multicore architecture for improved programmability

Communications of the ACM - Finding the Fun in Computer Science Education
DRFX: a simple and efficient memory model for concurrent programming languages

PLDI '10 Proceedings of the 2010 ACM SIGPLAN conference on Programming language design and implementation
Efficient sequential consistency using conditional fences

Proceedings of the 19th international conference on Parallel architectures and compilation techniques
ScalableBulk: Scalable Cache Coherence for Atomic Blocks in a Lazy Environment

MICRO '43 Proceedings of the 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture
Efficient processor support for DRFx, a memory model with exceptions

Proceedings of the sixteenth international conference on Architectural support for programming languages and operating systems
Data-race exceptions have benefits beyond the memory model

Proceedings of the 2011 ACM SIGPLAN Workshop on Memory Systems Performance and Correctness
A case for an SC-preserving compiler

Proceedings of the 32nd ACM SIGPLAN conference on Programming language design and implementation
FlexBulk: intelligently forming atomic blocks in blocked-execution multiprocessors to minimize squashes

Proceedings of the 38th annual international symposium on Computer architecture
CoreRacer: a practical memory race recorder for multicore x86 TSO processors

Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture
Efficient and accurate data dependence profiling using software signatures

Proceedings of the Tenth International Symposium on Code Generation and Optimization
End-to-end sequential consistency

Proceedings of the 39th Annual International Symposium on Computer Architecture
BlockChop: dynamic squash elimination for hybrid processor architecture

Proceedings of the 39th Annual International Symposium on Computer Architecture
DeAliaser: alias speculation using atomic region support

Proceedings of the eighteenth international conference on Architectural support for programming languages and operating systems
TSO_ATOMICITY: efficient hardware primitive for TSO-preserving region optimizations

Proceedings of the eighteenth international conference on Architectural support for programming languages and operating systems
BulkCommit: scalable and fast commit of atomic blocks in a lazy multiprocessor environment

Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture

Quantified Score

Hi-index	0.00

Visualization

Abstract

A platform that supported Sequential Consistency (SC) for all codes --- not only the well-synchronized ones --- would simplify the task of programmers. Recently, several hardware architectures that support high-performance SC by committing groups of instructions at a time have been proposed. However, for a platform to support SC, it is insufficient that the hardware does; the compiler has to support SC as well. This paper presents the hardware-compiler interface, and the main compiler ideas for BulkCompiler, a simple compiler layer that works with the group-committing hardware to provide a whole-system high-performance SC platform. We introduce ISA primitives and software algorithms for BulkCompiler to drive instruction-group formation, and to transform code to exploit the groups. Our simulation results show that BulkCompiler not only enables a whole-system SC environment, but also one that actually outperforms a conventional platform that uses the more relaxed Java Memory Model by an average of 37%. The speedups come from code optimization inside software-assembled instruction groups.