Checkpoint repair for out-of-order execution machines
ISCA '87 Proceedings of the 14th annual international symposium on Computer architecture
Alternative implementations of two-level adaptive branch prediction
ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
Complexity-effective superscalar processors
Proceedings of the 24th annual international symposium on Computer architecture
The agree predictor: a mechanism for reducing negative branch history interference
Proceedings of the 24th annual international symposium on Computer architecture
MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Memory dependence prediction using store sets
Proceedings of the 25th annual international symposium on Computer architecture
Dynamic memory disambiguation in the presence of out-of-order store issuing
Proceedings of the 32nd annual ACM/IEEE international symposium on Microarchitecture
Dual path instruction processing
ICS '02 Proceedings of the 16th international conference on Supercomputing
Increasing processor performance by implementing deeper pipelines
ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
Advanced Computer Architectures
Advanced Computer Architectures
The MIPS R10000 Superscalar Microprocessor
IEEE Micro
Cherry: checkpointed early resource recycling in out-of-order microprocessors
Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture
Automatic Generation of Microarchitecture Simulators
ICCL '98 Proceedings of the 1998 International Conference on Computer Languages
Dynamic Branch Prediction with Perceptrons
HPCA '01 Proceedings of the 7th International Symposium on High-Performance Computer Architecture
Checkpoint Processing and Recovery: Towards Scalable Large Instruction Window Processors
Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture
Proceedings of the 37th annual IEEE/ACM International Symposium on Microarchitecture
An analysis of a resource efficient checkpoint architecture
ACM Transactions on Architecture and Code Optimization (TACO)
Reducing Branch Misprediction Penalty via Selective Branch Recovery
HPCA '04 Proceedings of the 10th International Symposium on High Performance Computer Architecture
MinneSPEC: A New SPEC Benchmark Workload for Simulation-Based Computer Architecture Research
IEEE Computer Architecture Letters
BranchTap: improving performance with very few checkpoints through adaptive speculation control
Proceedings of the 20th annual international conference on Supercomputing
Hiding the misprediction penalty of a resource-efficient high-performance processor
ACM Transactions on Architecture and Code Optimization (TACO)
Improving single-thread performance with fine-grain state maintenance
Proceedings of the 5th conference on Computing frontiers
Superscalar architecture design for high performance DSP operations
Microprocessors & Microsystems
A distributed processor state management architecture for large-window processors
Proceedings of the 41st annual IEEE/ACM International Symposium on Microarchitecture
Turbo-ROB: a low cost checkpoint/restore accelerator
HiPEAC'08 Proceedings of the 3rd international conference on High performance embedded architectures and compilers
A power-aware hybrid RAM-CAM renaming mechanism for fast recovery
ICCD'09 Proceedings of the 2009 IEEE international conference on Computer design
Achieving reliable system performance by fast recovery of branch miss prediction
Journal of Network and Computer Applications
Hi-index | 0.00 |
Current trends in modern out-of-order processors involve implementing deeper pipelines and a large instruction window to achieve high performance. However, as pipeline depth increases, the branch misprediction penalty becomes a critical factor in overall processor performance. Current approaches to handling branch mispredictions either incrementally roll back to in-order state by waiting until the mispredicted branch reaches the head of the reorder buffer, or utilize checkpointing at branches for faster recovery. Rolling back to in-order state stalls the pipeline for a significant number of cycles and checkpointing is costly.This paper proposes a fast recovery mechanism, called Eager Misprediction Recovery (EMR), to reduce the branch misprediction penalty. Upon a misprediction, the processor immediately starts fetching and renaming instructions from the correct path without restoring the map table. Those instructions that access incorrect speculative values wait until the correct data are restored; however, instructions that access correct values continue executing while recovery occurs. Thus, the recovery mechanism hides the latency of long branch recovery with useful instructions.EMR achieves a mean performance improvement very close to a recovery mechanism that supports checkpointing at each branch. In addition, EMR provides an average of 9.0% and up to 19.9% better performance than traditional sequential misprediction recovery on the SPEC2000 benchmark suite.