Checkpoint repair for out-of-order execution machines
ISCA '87 Proceedings of the 14th annual international symposium on Computer architecture
Partitioned register files for VLIWs: a preliminary analysis of tradeoffs
MICRO 25 Proceedings of the 25th annual international symposium on Microarchitecture
Assigning confidence to conditional branch predictions
Proceedings of the 29th annual ACM/IEEE international symposium on Microarchitecture
Register renaming and dynamic speculation: an alternative approach
MICRO 26 Proceedings of the 26th annual international symposium on Microarchitecture
Proceedings of the ninth annual ACM symposium on Parallel algorithms and architectures
Complexity-effective superscalar processors
Proceedings of the 24th annual international symposium on Computer architecture
Implementation of precise interrupts in pipelined processors
ISCA '85 Proceedings of the 12th annual international symposium on Computer architecture
Two-level hierarchical register file organization for VLIW processors
Proceedings of the 33rd annual ACM/IEEE international symposium on Microarchitecture
Increasing processor performance by implementing deeper pipelines
ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
A large, fast instruction window for tolerating cache misses
ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
Select-free instruction scheduling logic
Proceedings of the 34th annual ACM/IEEE international symposium on Microarchitecture
Reducing the complexity of the register file in dynamic superscalar processors
Proceedings of the 34th annual ACM/IEEE international symposium on Microarchitecture
The MIPS R10000 Superscalar Microprocessor
IEEE Micro
Cherry: checkpointed early resource recycling in out-of-order microprocessors
Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture
Hierarchical Scheduling Windows
Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture
The Alpha 21264: A 500 MHz Out-of-Order Execution Microprocessor
COMPCON '97 Proceedings of the 42nd IEEE International Computer Conference
POWER4 system microarchitecture
IBM Journal of Research and Development
A first glance at Kilo-instruction based multiprocessors
Proceedings of the 1st conference on Computing frontiers
Microarchitecture Optimizations for Exploiting Memory-Level Parallelism
Proceedings of the 31st annual international symposium on Computer architecture
Memory Ordering: A Value-Based Approach
Proceedings of the 31st annual international symposium on Computer architecture
Prophet/Critic Hybrid Branch Prediction
Proceedings of the 31st annual international symposium on Computer architecture
Proceedings of the 31st annual international symposium on Computer architecture
ASPLOS XI Proceedings of the 11th international conference on Architectural support for programming languages and operating systems
Scalable selective re-execution for EDGE architectures
ASPLOS XI Proceedings of the 11th international conference on Architectural support for programming languages and operating systems
Fingerprinting: bounding soft-error detection latency and bandwidth
ASPLOS XI Proceedings of the 11th international conference on Architectural support for programming languages and operating systems
Efficient Resource Sharing in Concurrent Error Detecting Superscalar Microarchitectures
Proceedings of the 37th annual IEEE/ACM International Symposium on Microarchitecture
Memory Ordering: A Value-Based Approach
IEEE Micro
Toward kilo-instruction processors
ACM Transactions on Architecture and Code Optimization (TACO)
An analysis of a resource efficient checkpoint architecture
ACM Transactions on Architecture and Code Optimization (TACO)
Scalable Load and Store Processing in Latency Tolerant Processors
Proceedings of the 32nd annual international symposium on Computer Architecture
Store Vulnerability Window (SVW): Re-Execution Filtering for Enhanced Load Optimization
Proceedings of the 32nd annual international symposium on Computer Architecture
Store Buffer Design in First-Level Multibanked Data Caches
Proceedings of the 32nd annual international symposium on Computer Architecture
Virtualizing Transactional Memory
Proceedings of the 32nd annual international symposium on Computer Architecture
Energy reduction in multiprocessor systems using transactional memory
ISLPED '05 Proceedings of the 2005 international symposium on Low power electronics and design
Fast branch misprediction recovery in out-of-order superscalar processors
Proceedings of the 19th annual international conference on Supercomputing
Dual-Core Execution: Building a Highly Scalable Single-Thread Instruction Window
Proceedings of the 14th International Conference on Parallel Architectures and Compilation Techniques
A Simple Divide-and-Conquer Approach for Neural-Class Branch Prediction
Proceedings of the 14th International Conference on Parallel Architectures and Compilation Techniques
Memory State Compressors for Giga-Scale Checkpoint/Restore
Proceedings of the 14th International Conference on Parallel Architectures and Compilation Techniques
Load-Store Queue Management: an Energy-Efficient Design Based on a State-Filtering Mechanism.
ICCD '05 Proceedings of the 2005 International Conference on Computer Design
Proceedings of the 38th annual IEEE/ACM International Symposium on Microarchitecture
Incremental Commit Groups for Non-Atomic Trace Processing
Proceedings of the 38th annual IEEE/ACM International Symposium on Microarchitecture
Address-Indexed Memory Disambiguation and Store-to-Load Forwarding
Proceedings of the 38th annual IEEE/ACM International Symposium on Microarchitecture
Cherry-MP: Correctly Integrating Checkpointed Early Resource Recycling in Chip Multiprocessors
Proceedings of the 38th annual IEEE/ACM International Symposium on Microarchitecture
ReSlice: Selective Re-Execution of Long-Retired Misspeculated Instructions Using Forward Slicing
Proceedings of the 38th annual IEEE/ACM International Symposium on Microarchitecture
"Flea-flicker" Multipass Pipelining: An Alternative to the High-Power Out-of-Order Offense
Proceedings of the 38th annual IEEE/ACM International Symposium on Microarchitecture
Speculative execution for hiding memory latency
MEDEA '04 Proceedings of the 2004 workshop on MEmory performance: DEaling with Applications , systems and architecture
Kilo-instruction processors, runahead and prefetching
Proceedings of the 3rd conference on Computing frontiers
Proceedings of the 33rd annual international symposium on Computer Architecture
A Case for MLP-Aware Cache Replacement
Proceedings of the 33rd annual international symposium on Computer Architecture
Tolerating Dependences Between Large Speculative Threads Via Sub-Threads
Proceedings of the 33rd annual international symposium on Computer Architecture
CAVA: Using checkpoint-assisted value prediction to hide L2 misses
ACM Transactions on Architecture and Code Optimization (TACO)
Decomposing the load-store queue by function for power reduction and scalability
IBM Journal of Research and Development
Proceedings of the 15th international conference on Parallel architectures and compilation techniques
Overlapping dependent loads with addressless preload
Proceedings of the 15th international conference on Parallel architectures and compilation techniques
Early Register Deallocation Mechanisms Using Checkpointed Register Files
IEEE Transactions on Computers
Substituting associative load queue with simple hash tables in out-of-order microprocessors
Proceedings of the 2006 international symposium on Low power electronics and design
A simple speculative load control mechanism for energy saving
MEDEA '06 Proceedings of the 2006 workshop on MEmory performance: DEaling with Applications, systems and architectures
Reducing energy of virtual cache synonym lookup using bloom filters
CASES '06 Proceedings of the 2006 international conference on Compilers, architecture and synthesis for embedded systems
BranchTap: improving performance with very few checkpoints through adaptive speculation control
Proceedings of the 20th annual international conference on Supercomputing
A scalable low power issue queue for large instruction window processors
Proceedings of the 20th annual international conference on Supercomputing
Reunion: Complexity-Effective Multicore Redundancy
Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
DMDC: Delayed Memory Dependence Checking through Age-Based Filtering
Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
Scalable Cache Miss Handling for High Memory-Level Parallelism
Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
Unified microprocessor core storage
Proceedings of the 4th international conference on Computing frontiers
Hardware atomicity for reliable software speculation
Proceedings of the 34th annual international symposium on Computer architecture
Mechanisms for store-wait-free multiprocessors
Proceedings of the 34th annual international symposium on Computer architecture
BulkSC: bulk enforcement of sequential consistency
Proceedings of the 34th annual international symposium on Computer architecture
Late-binding: enabling unordered load-store queues
Proceedings of the 34th annual international symposium on Computer architecture
Ginger: control independence using tag rewriting
Proceedings of the 34th annual international symposium on Computer architecture
Transparent control independence (TCI)
Proceedings of the 34th annual international symposium on Computer architecture
Speculative optimization using hardware-monitored guarded regions for java virtual machines
Proceedings of the 3rd international conference on Virtual execution environments
An L2-miss-driven early register deallocation for SMT processors
Proceedings of the 21st annual international conference on Supercomputing
Building a large instruction window through ROB compression
MEDEA '07 Proceedings of the 2007 workshop on MEmory performance: DEaling with Applications, systems and architecture
Energy saving through a simple load control mechanism
ACM SIGARCH Computer Architecture News
Hiding the misprediction penalty of a resource-efficient high-performance processor
ACM Transactions on Architecture and Code Optimization (TACO)
A Two-Level Load/Store Queue Based on Execution Locality
ISCA '08 Proceedings of the 35th Annual International Symposium on Computer Architecture
Reducing register pressure in SMT processors through L2-miss-driven early register release
ACM Transactions on Architecture and Code Optimization (TACO)
On the potential of latency tolerant execution in speculative multithreading
IFMT '08 Proceedings of the 1st international forum on Next-generation multicore/manycore technologies
A performance-correctness explicitly-decoupled architecture
Proceedings of the 41st annual IEEE/ACM International Symposium on Microarchitecture
A distributed processor state management architecture for large-window processors
Proceedings of the 41st annual IEEE/ACM International Symposium on Microarchitecture
Proceedings of the 36th annual international symposium on Computer architecture
Proceedings of the 36th annual international symposium on Computer architecture
An energy-efficient checkpointing mechanism for out of order commit processor
Proceedings of the 14th ACM/IEEE international symposium on Low power electronics and design
Design and optimization of the store vectors memory dependence predictor
ACM Transactions on Architecture and Code Optimization (TACO)
Software-assisted hardware reliability: abstracting circuit-level challenges to the software stack
Proceedings of the 46th Annual Design Automation Conference
Folding active list for high performance and low power
ISHPC'05/ALPS'06 Proceedings of the 6th international symposium on high-performance computing and 1st international conference on Advanced low power systems
Turbo-ROB: a low cost checkpoint/restore accelerator
HiPEAC'08 Proceedings of the 3rd international conference on High performance embedded architectures and compilers
A power-aware hybrid RAM-CAM renaming mechanism for fast recovery
ICCD'09 Proceedings of the 2009 IEEE international conference on Computer design
Relax: an architectural framework for software recovery of hardware faults
Proceedings of the 37th annual international symposium on Computer architecture
On the latency and energy of checkpointed superscalar register alias tables
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Proceedings of the 19th international conference on Parallel architectures and compilation techniques
An event-guided approach to reducing voltage noise in processors
Proceedings of the Conference on Design, Automation and Test in Europe
CROB: implementing a large instruction window through compression
Transactions on high-performance embedded architectures and compilers III
CRIB: consolidated rename, issue, and bypass
Proceedings of the 38th annual international symposium on Computer architecture
Efficient system-on-chip energy management with a segmented bloom filter
ARCS'06 Proceedings of the 19th international conference on Architecture of Computing Systems
Non-uniform instruction scheduling
Euro-Par'05 Proceedings of the 11th international Euro-Par conference on Parallel Processing
Idempotent processor architecture
Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture
Achieving reliable system performance by fast recovery of branch miss prediction
Journal of Network and Computer Applications
Complexity-Effective rename table design for rapid speculation recovery
ARCS'10 Proceedings of the 23rd international conference on Architecture of Computing Systems
Disjoint out-of-order execution processor
ACM Transactions on Architecture and Code Optimization (TACO)
ARCS'13 Proceedings of the 26th international conference on Architecture of Computing Systems
Tuning the continual flow pipeline architecture
Proceedings of the 27th international ACM conference on International conference on supercomputing
Virtual register renaming: energy efficient substrate for continual flow pipelines
Proceedings of the 23rd ACM international conference on Great lakes symposium on VLSI
Enhancing NBTI recovery in SRAM arrays through recovery boosting
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Tuning the continual flow pipeline architecture with virtual register renaming
ACM Transactions on Architecture and Code Optimization (TACO)
Hi-index | 0.00 |
Large instruction window processors achieve high performance by exposing large amounts of instruction levelparallelism. However, accessing large hardware structurestypically required to buffer and process such instructionwindow sizes significantly degrade the cycle time. This paper proposes a novel Checkpoint Processing and Recovery(CPR) microarchitecture, and shows how to implement alarge instruction window processor without requiring largestructures thus permitting a high clock frequency.We focus on four critical aspects of a microarchitecture:1) scheduling instructions, 2) recovering from branch mispredicts, 3) buffering a large number of stores and forwarding data from stores to any dependent load, and 4) reclaiming physical registers. While scheduling window size isimportant, we show the performance of large instructionwindows to be more sensitive to the other three design issues. Our CPR proposal incorporates novel microarchitectural schemes for addressing these design issues-a selective checkpoint mechanism for recovering from mispredicts,a hierarchical store queue organization for fast store-loadforwarding, and an effective algorithm for aggressive physical register reclamation. Our proposals allow a processor to realize performance gains due to instruction windowsof thousands of instructions without requiring large cycle-critical hardware structures.