Reexecution and Selective Reuse in Checkpoint Processors

Authors:
Amit Golander;Shlomo Weiss
Affiliations:
Tel Aviv University, Tel Aviv, Israel 69978;Tel Aviv University, Tel Aviv, Israel 69978
Venue:
Transactions on High-Performance Embedded Architectures and Compilers II
Year:
2009

Citing 32
Cited 1

Implementing Precise Interrupts in Pipelined Processors

IEEE Transactions on Computers
Zero-cycle loads: microarchitecture support for reducing load latency

Proceedings of the 28th annual international symposium on Microarchitecture
Assigning confidence to conditional branch predictions

Proceedings of the 29th annual ACM/IEEE international symposium on Microarchitecture
Exceeding the dataflow limit via value prediction

Proceedings of the 29th annual ACM/IEEE international symposium on Microarchitecture
Dynamic speculation and synchronization of data dependences

Proceedings of the 24th annual international symposium on Computer architecture
Dynamic instruction reuse

Proceedings of the 24th annual international symposium on Computer architecture
Reducing branch misprediction penalties via dynamic control independence detection

ICS '99 Proceedings of the 13th international conference on Supercomputing
Dynamic removal of redundant computations

ICS '99 Proceedings of the 13th international conference on Supercomputing
Read-after-read memory dependence prediction

Proceedings of the 32nd annual ACM/IEEE international symposium on Microarchitecture
Skipper: a microarchitecture for exploiting control-flow independence

Proceedings of the 34th annual ACM/IEEE international symposium on Microarchitecture
The PowerPC 604 RISC microprocessor

IEEE Micro
The MIPS R10000 Superscalar Microprocessor

IEEE Micro
The Alpha 21264 Microprocessor

IEEE Micro
Increasing Instruction-Level Parallelism with Instruction Precomputation (Research Note)

Euro-Par '02 Proceedings of the 8th International Euro-Par Conference on Parallel Processing
Recycling waste: exploiting wrong-path execution to improve branch prediction

ICS '03 Proceedings of the 17th annual international conference on Supercomputing
Internal architecture of Alpha 21164 microprocessor

COMPCON '95 Proceedings of the 40th IEEE Computer Society International Conference
The PowerPC 620 microprocessor: a high performance superscalar RISC microprocessor

COMPCON '95 Proceedings of the 40th IEEE Computer Society International Conference
A Study of Control Independence in Superscalar Processors

HPCA '99 Proceedings of the 5th International Symposium on High Performance Computer Architecture
Synthesizable RAM-Alternative to Low Configuration Compiler Memory for Die Area Reduction

VLSID '00 Proceedings of the 13th International Conference on VLSI Design
Improving Processor Performance by Simplifying and Bypassing Trivial Computations

ICCD '02 Proceedings of the 2002 IEEE International Conference on Computer Design: VLSI in Computers and Processors (ICCD'02)
Control Flow Optimization Via Dynamic Reconvergence Prediction

Proceedings of the 37th annual IEEE/ACM International Symposium on Microarchitecture
Toward kilo-instruction processors

ACM Transactions on Architecture and Code Optimization (TACO)
An analysis of a resource efficient checkpoint architecture

ACM Transactions on Architecture and Code Optimization (TACO)
Control-Flow Independence Reuse via Dynamic Vectorization

IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Papers - Volume 01
Scalable Load and Store Processing in Latency Tolerant Processors

Proceedings of the 32nd annual international symposium on Computer Architecture
Reducing Branch Misprediction Penalty via Selective Branch Recovery

HPCA '04 Proceedings of the 10th International Symposium on High Performance Computer Architecture
ReSlice: Selective Re-Execution of Long-Retired Misspeculated Instructions Using Forward Slicing

Proceedings of the 38th annual IEEE/ACM International Symposium on Microarchitecture
On Reusing the Results of Pre-Executed Instructions in a Runahead Execution Processor

IEEE Computer Architecture Letters
BranchTap: improving performance with very few checkpoints through adaptive speculation control

Proceedings of the 20th annual international conference on Supercomputing
Hiding the misprediction penalty of a resource-efficient high-performance processor

ACM Transactions on Architecture and Code Optimization (TACO)
IBM Power5 Chip: A Dual-Core Multithreaded Processor

IEEE Micro
“Look it up" or "do the math": an energy, area, and timing analysis of instruction reuse and memoization

PACS'03 Proceedings of the Third international conference on Power - Aware Computer Systems

Minimal Multi-threading: Finding and Removing Redundant Instructions in Multi-threaded Processors

MICRO '43 Proceedings of the 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture

Quantified Score

Hi-index	0.00

Visualization

Abstract

Resource-efficient checkpoint processors have been shown to recover to an earlier safe state very fast. Yet in order to complete the misprediction recovery they also need to reexecute the code segment between the recovered checkpoint and the mispredicted instruction. This paper evaluates two novel reuse methods which accelerate reexecution paths by reusing the results of instructions and the outcome of branches obtained during the first run. The paper also evaluates, in the context of checkpoint processors, two other reuse methods targeting trivial and repetitive arithmetic operations. A reuse approach combining all four methods requires an area of 0.87[mm2], consumes 51.6[mW], and improves the energy-delay product by 4.8% and 11.85% for the integer and floating point benchmarks respectively.