BlockChop: dynamic squash elimination for hybrid processor architecture

Authors:
Jason Mars;Naveen Kumar
Affiliations:
University of Virginia;Intel Corporation
Venue:
Proceedings of the 39th Annual International Symposium on Computer Architecture
Year:
2012

Citing 29
Cited 1

Transactional memory: architectural support for lock-free data structures

ISCA '93 Proceedings of the 20th annual international symposium on computer architecture
Dynamic Binary Translation and Optimization

IEEE Transactions on Computers
rePLay: A Hardware Framework for Dynamic Optimization

IEEE Transactions on Computers
The Transmeta Code Morphing™ Software: using speculation, recovery, and adaptive retranslation to address real-life challenges

Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
A "flight data recorder" for enabling full-system multiprocessor deterministic replay

Proceedings of the 30th annual international symposium on Computer architecture
Eliminating Squashes Through Learning Cross-Thread Violations in Speculative Parallelization for Multiprocessors

HPCA '02 Proceedings of the 8th International Symposium on High-Performance Computer Architecture
Transactional Memory Coherence and Consistency

Proceedings of the 31st annual international symposium on Computer architecture
Power Awareness through Selective Dynamically Optimized Traces

Proceedings of the 31st annual international symposium on Computer architecture
BugNet: Continuously Recording Program Execution for Deterministic Replay Debugging

Proceedings of the 32nd annual international symposium on Computer Architecture
Bulk Disambiguation of Speculative Threads in Multiprocessors

Proceedings of the 33rd annual international symposium on Computer Architecture
Hybrid transactional memory

Proceedings of the 12th international conference on Architectural support for programming languages and operating systems
An effective hybrid transactional memory system with strong isolation guarantees

Proceedings of the 34th annual international symposium on Computer architecture
Hardware atomicity for reliable software speculation

Proceedings of the 34th annual international symposium on Computer architecture
Mechanisms for store-wait-free multiprocessors

Proceedings of the 34th annual international symposium on Computer architecture
BulkSC: bulk enforcement of sequential consistency

Proceedings of the 34th annual international symposium on Computer architecture
A Scalable, Non-blocking Approach to Transactional Memory

HPCA '07 Proceedings of the 2007 IEEE 13th International Symposium on High Performance Computer Architecture
Atom-Aid: Detecting and Surviving Atomicity Violations

ISCA '08 Proceedings of the 35th Annual International Symposium on Computer Architecture
DeLorean: Recording and Deterministically Replaying Shared-Memory Multiprocessor Execution Ef?ciently

ISCA '08 Proceedings of the 35th Annual International Symposium on Computer Architecture
Scalable and reliable communication for hardware transactional memory

Proceedings of the 17th international conference on Parallel architectures and compilation techniques
DMP: deterministic shared memory multiprocessing

Proceedings of the 14th international conference on Architectural support for programming languages and operating systems
InvisiFence: performance-transparent memory ordering in conventional multiprocessors

Proceedings of the 36th annual international symposium on Computer architecture
The Bulk Multicore architecture for improved programmability

Communications of the ACM - Finding the Fun in Computer Science Education
BulkCompiler: high-performance sequential consistency through cooperative compiler and hardware support

Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture
A real system evaluation of hardware atomicity for software speculation

Proceedings of the fifteenth edition of ASPLOS on Architectural support for programming languages and operating systems
Respec: efficient online multiprocessor replayvia speculation and external determinism

Proceedings of the fifteenth edition of ASPLOS on Architectural support for programming languages and operating systems
TAO: two-level atomicity for dynamic binary optimizations

Proceedings of the 8th annual IEEE/ACM international symposium on Code generation and optimization
FlexBulk: intelligently forming atomic blocks in blocked-execution multiprocessors to minimize squashes

Proceedings of the 38th annual international symposium on Computer architecture
RADBench: a concurrency bug benchmark suite

HotPar'11 Proceedings of the 3rd USENIX conference on Hot topic in parallelism
LAR-CC: Large atomic regions with conditional commits

CGO '11 Proceedings of the 9th Annual IEEE/ACM International Symposium on Code Generation and Optimization

Speculative hardware/software co-designed floating-point multiply-add fusion

Proceedings of the 19th international conference on Architectural support for programming languages and operating systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

Hybrid processors are HW/SW co-designed processors that leverage blocked-execution, the execution of regions of instructions as atomic blocks, to facilitate aggressive speculative optimization. As we move to a multicore hybrid design, fine grained conflicts for shared data can violate the atomicity requirement of these blocks and lead to expensive squashes and rollbacks. However, as these atomic regions differ from those used in checkpointing and transactional memory systems, the extent of this potentially prohibitive problem remains unclear, and mechanisms to mitigate these squashes dynamically may be critical to enable a highly per-formant multicore hybrid design. In this work, we investigate how multithreaded applications, both benchmark and commercial workloads, are affected by squashes, and present dynamic mechanisms for mitigating these squashes in hybrid processors. While the current wisdom is that there is not a significant number of squashes for smaller atomic regions, we observe this is not the case for many multithreaded workloads. With region sizes of just 200--500 instructions, we observe a performance degradation ranging from 10% to more than 50% for workloads with a mixture of shared reads and writes. By harnessing the unique flexibility provided by the software subsystem of hybrid processor design, we present BlockChop, a framework for dynamically mitigating squashes on multicore hybrid processors. We present a range of squash handling mechanisms leveraging retrials, interpretation, and retranslation, and find that BlockChop is quite effective. Over the current response to exceptions and squashes in a hybrid design, we are able to improve the performance of benchmark and commercial workloads by 1.4x and 1.2x on average for large and small region sizes respectively.