On the latency and energy of checkpointed superscalar register alias tables

Authors:
Elham Safi;Andreas Moshovos;Andreas Veneris
Affiliations:
Department of Electrical and Computer Engineering, University of Toronto, Toronto, ON, Canada;Department of Electrical and Computer Engineering, University of Toronto, Toronto, ON, Canada;Department of Electrical and Computer Engineering, University of Toronto, Toronto, ON, Canada
Venue:
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Year:
2010

Citing 18
Cited 2

Assigning confidence to conditional branch predictions

Proceedings of the 29th annual ACM/IEEE international symposium on Microarchitecture
Analytical energy dissipation models for low-power caches

ISLPED '97 Proceedings of the 1997 international symposium on Low power electronics and design
Wattch: a framework for architectural-level power analysis and optimizations

Proceedings of the 27th annual international symposium on Computer architecture
Energy-driven integrated hardware-software optimizations using SimplePower

Proceedings of the 27th annual international symposium on Computer architecture
Inherently Lower-Power High-Performance Superscalar Architectures

IEEE Transactions on Computers
The MIPS R10000 Superscalar Microprocessor

IEEE Micro
The Alpha 21264 Microprocessor

IEEE Micro
The Design of a Register Renaming Unit

GLS '99 Proceedings of the Ninth Great Lakes Symposium on VLSI
Checkpointing alternatives for high performance, power-aware processors

Proceedings of the 2003 international symposium on Low power electronics and design
Complexity-effective superscalar processors

Complexity-effective superscalar processors
Checkpoint Processing and Recovery: Towards Scalable Large Instruction Window Processors

Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture
Analytical models for leakage power estimation of memory array structures

Proceedings of the 2nd IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis
An analysis of a resource efficient checkpoint architecture

ACM Transactions on Architecture and Code Optimization (TACO)
Reducing Rename Logic Complexity for High-Speed and Low-Power Front-End Architectures

IEEE Transactions on Computers
BranchTap: improving performance with very few checkpoints through adaptive speculation control

Proceedings of the 20th annual international conference on Supercomputing
On the latency, energy and area of checkpointed, superscalar register alias tables

ISLPED '07 Proceedings of the 2007 international symposium on Low power electronics and design
Architectural power models for SRAM and CAM structures based on hybrid analytical/empirical techniques

Proceedings of the 2007 IEEE/ACM international conference on Computer-aided design
A physical level study and optimization of CAM-based checkpointed register alias table

Proceedings of the 13th international symposium on Low power electronics and design

A physical-level study of the compacted matrix instruction scheduler for dynamically-scheduled superscalar processors

SAMOS'09 Proceedings of the 9th international conference on Systems, architectures, modeling and simulation
Exploiting replicated checkpoints for soft error detection and correction

Proceedings of the Conference on Design, Automation and Test in Europe

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper investigates how the latency and energy of register alias tables (RATs) vary as a function of the number of global checkpoints (GCs), processor issue width, and window size. It improves upon previous RAT checkpointing work that ignored the actual latency and energy tradeoffs and focused solely on evaluating performance in terms of instructions per cycle (IPC). This work utilizes measurements from the full-custom checkpointed RAT implementations developed in a commercial 130-nm fabrication technology. Using physical- and architectural-level evaluations together, this paper demonstrates the tradeoffs among the aggressiveness of the RAT checkpointing, performance, and energy. This paper also shows that, as expected, focusing on IPC alone incorrectly predicts performance. The results of this study justify checkpointing techniques that use very few GCs (e.g., four). Additionally, based on full-custom implementations for the checkpointed RATs, this paper presents analytical latency and energy models. These models can be useful in the early stages of architectural exploration where actual physical implementations are unavailable or are hard to develop. For a variety of RAT organizations, our model estimations are within 6.4% and 11.6% of circuit simulation results for latency and energy, respectively. This range of accuracy is acceptable for architectural-level studies