Incremental Commit Groups for Non-Atomic Trace Processing

Authors:
Matt T. Yourst;Kanad Ghose
Affiliations:
State University of New York at Binghamton;State University of New York at Binghamton
Venue:
Proceedings of the 38th annual IEEE/ACM International Symposium on Microarchitecture
Year:
2005

Citing 17
Cited 3

Implementing Precise Interrupts in Pipelined Processors

IEEE Transactions on Computers
Efficient superscalar performance through boosting

ASPLOS V Proceedings of the fifth international conference on Architectural support for programming languages and operating systems
Dynamically scheduled VLIW processors

MICRO 26 Proceedings of the 26th annual international symposium on Microarchitecture
Path-based next trace prediction

MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Improving the accuracy and performance of memory communication through renaming

MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Memory dependence prediction using store sets

Proceedings of the 25th annual international symposium on Computer architecture
Speculative Memory Cloaking and Bypassing

International Journal of Parallel Programming - Special issue on the 30th annual ACM/IEEE international symposium on microarchitecture, part II
Increasing the size of atomic instruction blocks using control flow assertions

Proceedings of the 33rd annual ACM/IEEE international symposium on Microarchitecture
Interrupt Handling for Out-of-Order Execution Processors

IEEE Transactions on Computers
Cherry: checkpointed early resource recycling in out-of-order microprocessors

Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture
The Transmeta Code Morphing™ Software: using speculation, recovery, and adaptive retranslation to address real-life challenges

Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
Dynamic trace selection using performance monitoring hardware sampling

Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
A Fast Interrupt Handling Scheme for VLIW Processors

PACT '98 Proceedings of the 1998 International Conference on Parallel Architectures and Compilation Techniques
Vliw processors: efficiently exploiting instruction level parallelism

Vliw processors: efficiently exploiting instruction level parallelism
Checkpoint Processing and Recovery: Towards Scalable Large Instruction Window Processors

Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture
Increasing Processor Performance Through Early Register Release

ICCD '04 Proceedings of the IEEE International Conference on Computer Design
Out-of-Order Commit Processors

HPCA '04 Proceedings of the 10th International Symposium on High Performance Computer Architecture

An energy-efficient checkpointing mechanism for out of order commit processor

Proceedings of the 14th ACM/IEEE international symposium on Low power electronics and design
Reusing cached schedules in an out-of-order processor with in-order issue logic

ICCD'09 Proceedings of the 2009 IEEE international conference on Computer design
Discerning the dominant out-of-order performance advantage: is it speculation or dynamism?

Proceedings of the eighteenth international conference on Architectural support for programming languages and operating systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

We introduce techniques to support efficient non-atomic execution of very long traces on a new binary translation based, x86-64 compatible VLIW microprocessor. Incrementally committed long traces significantly reduce wasted computations on exception induced rollbacks by retaining the correctly committed parts of traces. We divide each scheduled trace into multiple commit groups; groups are committed to the architectural state after all instructions within and prior to each group complete without exceptions. Architectural state updates are only visible after future commit points are deferred using a simple hardware commit buffer. We employ a commit depth predictor to predict how many groups a trace will complete, thereby eliminating pipeline flushes on repeated rollbacks. Unlike atomic traces, we allow instructions to be freely scheduled across commit points throughout the trace to maximize ILP. Commit groups are formed after scheduling, allowing the commit points terminating each group to be inserted more optimally. Commit groups promote significantly faster convergence on optimized traces, since we salvage partially executed traces and splice the working parts together into new optimized traces. We use detailed models to demonstrate how commit groups substantially improve performance (on average, over 1.5脳 on SPEC 2000) relative to atomic traces.