Idempotent processor architecture

Authors:
Marc de Kruijf;Karthikeyan Sankaralingam
Affiliations:
University of Wisconsin, Madison;University of Wisconsin, Madison
Venue:
Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture
Year:
2011

Citing 25
Cited 4

Checkpoint repair for out-of-order execution machines

ISCA '87 Proceedings of the 14th annual international symposium on Computer architecture
Instruction issue logic for high-performance, interruptable pipelined processors

ISCA '87 Proceedings of the 14th annual international symposium on Computer architecture
Implementing Precise Interrupts in Pipelined Processors

IEEE Transactions on Computers
Sentinel scheduling for VLIW and superscalar processors

ASPLOS V Proceedings of the fifth international conference on Architectural support for programming languages and operating systems
Compiler-Based Multiple Instruction Retry

IEEE Transactions on Computers
Atomic heap transactions and fine-grain interrupts

Proceedings of the fourth ACM SIGPLAN international conference on Functional programming
Superscalar Instruction Execution in the 21164 Alpha Microprocessor

IEEE Micro
Precise Exception Semantics in Dynamic Compilation

CC '02 Proceedings of the 11th International Conference on Compiler Construction
Cherry: checkpointed early resource recycling in out-of-order microprocessors

Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture
Loose Loops Sink Chips

HPCA '02 Proceedings of the 8th International Symposium on High-Performance Computer Architecture
Checkpoint Processing and Recovery: Towards Scalable Large Instruction Window Processors

Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture
LLVM: A Compilation Framework for Lifelong Program Analysis & Transformation

Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
Out-of-Order Commit Processors

HPCA '04 Proceedings of the 10th International Symposium on High Performance Computer Architecture
Introduction to the cell multiprocessor

IBM Journal of Research and Development - POWER5 and packaging
Exploiting reference idempotency to reduce speculative storage overflow

ACM Transactions on Programming Languages and Systems (TOPLAS)
Deconstructing commit

ISPASS '04 Proceedings of the 2004 IEEE International Symposium on Performance Analysis of Systems and Software
The M5 Simulator: Modeling Networked Systems

IEEE Micro
VEAL: Virtualized Execution Accelerator for Loops

ISCA '08 Proceedings of the 35th Annual International Symposium on Computer Architecture
The PARSEC benchmark suite: characterization and architectural implications

Proceedings of the 17th international conference on Parallel architectures and compilation techniques
Reducing exception management overhead with software restart markers

Reducing exception management overhead with software restart markers
Conservation cores: reducing the energy of mature computations

Proceedings of the fifteenth edition of ASPLOS on Architectural support for programming languages and operating systems
Understanding sources of inefficiency in general-purpose chips

Proceedings of the 37th annual international symposium on Computer architecture
Processor Microarchitecture: An Implementation Perspective

Processor Microarchitecture: An Implementation Perspective
Dark silicon and the end of multicore scaling

Proceedings of the 38th annual international symposium on Computer architecture
Dynamically Specialized Datapaths for energy efficient computing

HPCA '11 Proceedings of the 2011 IEEE 17th International Symposium on High Performance Computer Architecture

Supporting virtual memory in GPGPU without supporting precise exceptions

Proceedings of the 2012 ACM SIGPLAN Workshop on Memory Systems Performance and Correctness
Static analysis and compiler design for idempotent processing

Proceedings of the 33rd ACM SIGPLAN conference on Programming Language Design and Implementation
iGPU: exception support and speculative execution on GPUs

Proceedings of the 39th Annual International Symposium on Computer Architecture
ConAir: featherweight concurrency bug recovery via single-threaded idempotent execution

Proceedings of the eighteenth international conference on Architectural support for programming languages and operating systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

Improving architectural energy efficiency is important to address diminishing energy efficiency gains from technology scaling. At the same time, limiting hardware complexity is also important. This paper presents a new processor architecture, the idempotent processor architecture, that advances both of these directions by presenting a new execution paradigm that allows speculative execution without the need for hardware checkpoints to recover from mis-speculation, instead using only re-execution to recover. Idempotent processors execute programs as a sequence of compiler-constructed idempotent (re-executable) regions. The nature of these regions allows precise state to be reproduced by re-execution, obviating the need for hardware recovery support. We build upon the insight that programs naturally decompose into a series of idempotent regions and that these regions can be large. The paradigm of executing idempotent regions, which we call idempotent processing, can be used to support various types of speculation, including branch prediction, dependence prediction, or execution in the presence of hardware faults or exceptions. In this paper, we demonstrate how idempotent processing simplifies the design of in-order processors. Conventional in-order processors suffer from significant complexities to achieve high performance while supporting the execution of variable latency instructions and enforcing precise exceptions. Idempotent processing eliminates much of these complexities and the resulting inefficiencies by allowing instructions to retire out of order with support for re-execution when necessary to recover precise state. Across a diverse set of benchmark suites, our quantitative results show that we obtain a geometric mean performance increase of 4.4% (up to 25% and beyond) while maintaining an overall reduction in power and hardware complexity.