Cherry: checkpointed early resource recycling in out-of-order microprocessors

Authors:
José F. Martínez;Jose Renau;Michael C. Huang;Milos Prvulovic;Josep Torrellas
Affiliations:
Cornell University;University of Illinois at Urbana-Champaign;University of Rochester;University of Illinois at Urbana-Champaign;University of Illinois at Urbana-Champaign
Venue:
Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture
Year:
2002

Citing 19
Cited 82

Checkpoint repair for out-of-order execution machines

ISCA '87 Proceedings of the 14th annual international symposium on Computer architecture
Implementing Precise Interrupts in Pipelined Processors

IEEE Transactions on Computers
Multiscalar processors

ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Exploiting short-lived variables in superscalar processors

Proceedings of the 28th annual international symposium on Microarchitecture
Register renaming and dynamic speculation: an alternative approach

MICRO 26 Proceedings of the 26th annual international symposium on Microarchitecture
Exploiting dead value information

MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Data speculation support for a chip multiprocessor

Proceedings of the eighth international conference on Architectural support for programming languages and operating systems
Clustered speculative multithreaded processors

ICS '99 Proceedings of the 13th international conference on Supercomputing
A Chip-Multiprocessor Architecture with Speculative Multithreading

IEEE Transactions on Computers
Delaying physical register allocation through virtual-physical registers

Proceedings of the 32nd annual ACM/IEEE international symposium on Microarchitecture
Software-Directed Register Deallocation for Simultaneous Multithreaded Processors

IEEE Transactions on Parallel and Distributed Systems
Multiple-banked register file architectures

Proceedings of the 27th annual international symposium on Computer architecture
Two-level hierarchical register file organization for VLIW processors

Proceedings of the 33rd annual ACM/IEEE international symposium on Microarchitecture
A large, fast instruction window for tolerating cache misses

ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
SPEC CPU2000: Measuring CPU Performance in the New Millennium

Computer
The MIPS R10000 Superscalar Microprocessor

IEEE Micro
The Potential for Using Thread-Level Data Speculation to Facilitate Automatic Parallelization

HPCA '98 Proceedings of the 4th International Symposium on High-Performance Computer Architecture
A Scalable Register File Architecture for Dynamically Scheduled Processors

PACT '96 Proceedings of the 1996 Conference on Parallel Architectures and Compilation Techniques
POWER4 system microarchitecture

IBM Journal of Research and Development

A "flight data recorder" for enabling full-system multiprocessor deterministic replay

Proceedings of the 30th annual international symposium on Computer architecture
Reducing reorder buffer complexity through selective operand caching

Proceedings of the 2003 international symposium on Low power electronics and design
Checkpoint Processing and Recovery: Towards Scalable Large Instruction Window Processors

Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture
Isolating Short-Lived Operands for Energy Reduction

IEEE Transactions on Computers
Memory Ordering: A Value-Based Approach

Proceedings of the 31st annual international symposium on Computer architecture
A Content Aware Integer Register File Organization

Proceedings of the 31st annual international symposium on Computer architecture
Physical Register Inlining

Proceedings of the 31st annual international symposium on Computer architecture
A case for resource-conscious out-of-order processors: towards kilo-instruction in-flight processors

MEDEA '03 Proceedings of the 2003 workshop on MEmory performance: DEaling with Applications , systems and architecture
Continual flow pipelines

ASPLOS XI Proceedings of the 11th international conference on Architectural support for programming languages and operating systems
Register Packing: Exploiting Narrow-Width Operands for Reducing Register File Pressure

Proceedings of the 37th annual IEEE/ACM International Symposium on Microarchitecture
Toward kilo-instruction processors

ACM Transactions on Architecture and Code Optimization (TACO)
An analysis of a resource efficient checkpoint architecture

ACM Transactions on Architecture and Code Optimization (TACO)
Continuous Optimization

Proceedings of the 32nd annual international symposium on Computer Architecture
RENO: A Rename-Based Instruction Optimizer

Proceedings of the 32nd annual international symposium on Computer Architecture
Kilo-Instruction Processors: Overcoming the Memory Wall

IEEE Micro
Fast branch misprediction recovery in out-of-order superscalar processors

Proceedings of the 19th annual international conference on Supercomputing
Compiler Directed Early Register Release

Proceedings of the 14th International Conference on Parallel Architectures and Compilation Techniques
Incremental Commit Groups for Non-Atomic Trace Processing

Proceedings of the 38th annual IEEE/ACM International Symposium on Microarchitecture
Cherry-MP: Correctly Integrating Checkpointed Early Resource Recycling in Chip Multiprocessors

Proceedings of the 38th annual IEEE/ACM International Symposium on Microarchitecture
ReSlice: Selective Re-Execution of Long-Retired Misspeculated Instructions Using Forward Slicing

Proceedings of the 38th annual IEEE/ACM International Symposium on Microarchitecture
Speculative execution for hiding memory latency

MEDEA '04 Proceedings of the 2004 workshop on MEmory performance: DEaling with Applications , systems and architecture
Speculative early register release

Proceedings of the 3rd conference on Computing frontiers
Tolerating Dependences Between Large Speculative Threads Via Sub-Threads

Proceedings of the 33rd annual international symposium on Computer Architecture
CAVA: Using checkpoint-assisted value prediction to hide L2 misses

ACM Transactions on Architecture and Code Optimization (TACO)
SPARTAN: speculative avoidance of register allocations to transient values for performance and energy efficiency

Proceedings of the 15th international conference on Parallel architectures and compilation techniques
Early Register Deallocation Mechanisms Using Checkpointed Register Files

IEEE Transactions on Computers
Selective writeback: exploiting transient values for energy-efficiency and performance

Proceedings of the 2006 international symposium on Low power electronics and design
Ultra low-cost defect protection for microprocessor pipelines

Proceedings of the 12th international conference on Architectural support for programming languages and operating systems
BranchTap: improving performance with very few checkpoints through adaptive speculation control

Proceedings of the 20th annual international conference on Supercomputing
PathExpander: Architectural Support for Increasing the Path Coverage of Dynamic Bug Detection

Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
Scalable Cache Miss Handling for High Memory-Level Parallelism

Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
Compacting register file via 2-level renaming and bit-partitioning

Microprocessors & Microsystems
Hardware atomicity for reliable software speculation

Proceedings of the 34th annual international symposium on Computer architecture
Core fusion: accommodating software diversity in chip multiprocessors

Proceedings of the 34th annual international symposium on Computer architecture
Mechanisms for store-wait-free multiprocessors

Proceedings of the 34th annual international symposium on Computer architecture
BulkSC: bulk enforcement of sequential consistency

Proceedings of the 34th annual international symposium on Computer architecture
Low-cost protection for SER upsets and silicon defects

Proceedings of the conference on Design, automation and test in Europe
An L2-miss-driven early register deallocation for SMT processors

Proceedings of the 21st annual international conference on Supercomputing
Building a large instruction window through ROB compression

MEDEA '07 Proceedings of the 2007 workshop on MEmory performance: DEaling with Applications, systems and architecture
Predicting and Exploiting Transient Values for Reducing Register File Pressure and Energy Consumption

IEEE Transactions on Computers
Future ILP processors

International Journal of High Performance Computing and Networking
Asymmetrically banked value-aware register files for low-energy and high-performance

Microprocessors & Microsystems
Support for symmetric shadow memory in multiprocessors

PADTAD '08 Proceedings of the 6th workshop on Parallel and distributed systems: testing, analysis, and debugging
Hiding cache miss penalty using priority-based execution for embedded processors

Proceedings of the conference on Design, automation and test in Europe
Streamlining long latency instructions for seamlessly combined out-of-order and in-order execution

Microprocessors & Microsystems
Skewed redundancy

Proceedings of the 17th international conference on Parallel architectures and compilation techniques
Reducing register pressure in SMT processors through L2-miss-driven early register release

ACM Transactions on Architecture and Code Optimization (TACO)
Selective writeback: reducing register file pressure and energy consumption

IEEE Transactions on Very Large Scale Integration (VLSI) Systems
On the potential of latency tolerant execution in speculative multithreading

IFMT '08 Proceedings of the 1st international forum on Next-generation multicore/manycore technologies
Architectural support for shadow memory in multiprocessors

Proceedings of the 2009 ACM SIGPLAN/SIGOPS international conference on Virtual execution environments
A distributed processor state management architecture for large-window processors

Proceedings of the 41st annual IEEE/ACM International Symposium on Microarchitecture
Runtime monitoring on multicores via OASES

ACM SIGOPS Operating Systems Review
InvisiFence: performance-transparent memory ordering in conventional multiprocessors

Proceedings of the 36th annual international symposium on Computer architecture
Decoupled store completion/silent deterministic replay: enabling scalable data memory for CPR/CFP processors

Proceedings of the 36th annual international symposium on Computer architecture
Boosting single-thread performance in multi-core systems through fine-grain multi-threading

Proceedings of the 36th annual international symposium on Computer architecture
Simultaneous speculative threading: a novel pipeline architecture implemented in sun's rock processor

Proceedings of the 36th annual international symposium on Computer architecture
Checkpoint allocation and release

ACM Transactions on Architecture and Code Optimization (TACO)
Exploring the limits of early register release: Exploiting compiler analysis

ACM Transactions on Architecture and Code Optimization (TACO)
Energy-efficient renaming with register versioning

Proceedings of the 14th ACM/IEEE international symposium on Low power electronics and design
An energy-efficient checkpointing mechanism for out of order commit processor

Proceedings of the 14th ACM/IEEE international symposium on Low power electronics and design
Folding active list for high performance and low power

ISHPC'05/ALPS'06 Proceedings of the 6th international symposium on high-performance computing and 1st international conference on Advanced low power systems
Conflict exceptions: simplifying concurrent language semantics with precise hardware exceptions for data-races

Proceedings of the 37th annual international symposium on Computer architecture
Eliminating voltage emergencies via software-guided code transformations

ACM Transactions on Architecture and Code Optimization (TACO)
Exploiting narrow-width values for thermal-aware register file designs

Proceedings of the Conference on Design, Automation and Test in Europe
CROB: implementing a large instruction window through compression

Transactions on high-performance embedded architectures and compilers III
CRIB: consolidated rename, issue, and bypass

Proceedings of the 38th annual international symposium on Computer architecture
Accelerating data race detection with minimal hardware support

Euro-Par'11 Proceedings of the 17th international conference on Parallel processing - Volume Part I
Trade-offs in transient fault recovery schemes for redundant multithreaded processors

HiPC'06 Proceedings of the 13th international conference on High Performance Computing
Supporting speculative multithreading on simultaneous multithreaded processors

HiPC'06 Proceedings of the 13th international conference on High Performance Computing
2L-MuRR: a compact register renaming scheme for SMT processors

ISPA'05 Proceedings of the Third international conference on Parallel and Distributed Processing and Applications
DDGacc: boosting dynamic DDG-based binary optimizations through specialized hardware support

VEE '12 Proceedings of the 8th ACM SIGPLAN/SIGOPS conference on Virtual Execution Environments
Idempotent processor architecture

Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture
Achieving reliable system performance by fast recovery of branch miss prediction

Journal of Network and Computer Applications
An optimized front-end physical register file with banking and writeback filtering

PACS'04 Proceedings of the 4th international conference on Power-Aware Computer Systems
Specification and synthesis of hardware checkpointing and rollback mechanisms

Proceedings of the 49th Annual Design Automation Conference
Disjoint out-of-order execution processor

ACM Transactions on Architecture and Code Optimization (TACO)
A survey and taxonomy of on-chip monitoring of multicore systems-on-chip

ACM Transactions on Design Automation of Electronic Systems (TODAES)
Virtual register renaming

ARCS'13 Proceedings of the 26th international conference on Architecture of Computing Systems
Tuning the continual flow pipeline architecture

Proceedings of the 27th international ACM conference on International conference on supercomputing
Virtual register renaming: energy efficient substrate for continual flow pipelines

Proceedings of the 23rd ACM international conference on Great lakes symposium on VLSI
Revisiting reorder buffer architecture for next generation high performance computing

The Journal of Supercomputing
Tuning the continual flow pipeline architecture with virtual register renaming

ACM Transactions on Architecture and Code Optimization (TACO)

Quantified Score

Hi-index	0.01

Visualization

Abstract

This paper presents CHeckpointed Early Resource RecYcling (Cherry), a hybrid mode of execution based on ROB and checkpointing that decouples resource recycling and instruction retirement. Resources are recycled early, resulting in a more efficient utilization. Cherry relies on state checkpointing and rollback to service exceptions for instructions whose resources have been recycled. Cherry leverages the ROB to (1) not require in-order execution as a fallback mechanism, (2) allow memory replay traps and branch mispredictions without rolling back to the Cherry checkpoint, and (3) quickly fall back to conventional out-of-order execution without rolling back to the checkpoint or flushing the pipeline.We present a Cherry implementation with early recycling at three different points of the execution engine: the load queue, the store queue, and the register file. We report average speedups of 1.06 and 1.26 in SPECint and SPECfp applications, respectively, relative to an aggressive conventional architecture. We also describe how Cherry and speculative multithreading can be combined and complement each other.