Assigning confidence to conditional branch predictions
Proceedings of the 29th annual ACM/IEEE international symposium on Microarchitecture
Complexity-effective superscalar processors
Proceedings of the 24th annual international symposium on Computer architecture
Confidence estimation for speculation control
Proceedings of the 25th annual international symposium on Computer architecture
Computer architecture (2nd ed.): a quantitative approach
Computer architecture (2nd ed.): a quantitative approach
Wattch: a framework for architectural-level power analysis and optimizations
Proceedings of the 27th annual international symposium on Computer architecture
The MIPS R10000 Superscalar Microprocessor
IEEE Micro
An analysis of a resource efficient checkpoint architecture
ACM Transactions on Architecture and Code Optimization (TACO)
Memory State Compressors for Giga-Scale Checkpoint/Restore
Proceedings of the 14th International Conference on Parallel Architectures and Compilation Techniques
Compiler-directed high-level energy estimation and optimization
ACM Transactions on Embedded Computing Systems (TECS)
Reducing Rename Logic Complexity for High-Speed and Low-Power Front-End Architectures
IEEE Transactions on Computers
BranchTap: improving performance with very few checkpoints through adaptive speculation control
Proceedings of the 20th annual international conference on Supercomputing
On the latency, energy and area of checkpointed, superscalar register alias tables
ISLPED '07 Proceedings of the 2007 international symposium on Low power electronics and design
Building a large instruction window through ROB compression
MEDEA '07 Proceedings of the 2007 workshop on MEmory performance: DEaling with Applications, systems and architecture
Hiding the misprediction penalty of a resource-efficient high-performance processor
ACM Transactions on Architecture and Code Optimization (TACO)
A physical level study and optimization of CAM-based checkpointed register alias table
Proceedings of the 13th international symposium on Low power electronics and design
A distributed processor state management architecture for large-window processors
Proceedings of the 41st annual IEEE/ACM International Symposium on Microarchitecture
Checkpoint allocation and release
ACM Transactions on Architecture and Code Optimization (TACO)
An energy-efficient checkpointing mechanism for out of order commit processor
Proceedings of the 14th ACM/IEEE international symposium on Low power electronics and design
Turbo-ROB: a low cost checkpoint/restore accelerator
HiPEAC'08 Proceedings of the 3rd international conference on High performance embedded architectures and compilers
A power-aware hybrid RAM-CAM renaming mechanism for fast recovery
ICCD'09 Proceedings of the 2009 IEEE international conference on Computer design
SAMOS'09 Proceedings of the 9th international conference on Systems, architectures, modeling and simulation
On the latency and energy of checkpointed superscalar register alias tables
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
CROB: implementing a large instruction window through compression
Transactions on high-performance embedded architectures and compilers III
Achieving reliable system performance by fast recovery of branch miss prediction
Journal of Network and Computer Applications
Something old and something new: P-states can borrow microarchitecture techniques too
Proceedings of the 2012 ACM/IEEE international symposium on Low power electronics and design
Hi-index | 0.00 |
High performance processors use checkpointing to rapidly recover from branch mispredictions and possibly other exceptions. We demonstrate that conventional checkpointing becomes unattractive in terms of resource and power requirements for future generation processors. We propose out-of-order checkpoint release and checkpoint prediction, two alternatives that require significantly less resources and power while maintaining high-performance. We demonstrate their utility at the register alias table (RAT). Our methods reduce the number of RAT checkpoints to 1/3 (from 48 down to 16) for an aggressive, 8-way superscalar processor with a 256-entry instruction window. Using a 0.18um process model we estimate that RAT power is reduced by 24%.