Power-efficient clustering via incomplete bypassing

Authors:
Eric P. Villasenor;DaeHo Seo;Mithuna S. Thottethodi
Affiliations:
Purdue University, West Lafayette, IN, USA;Purdue University, West Lafayette, IN, USA;Purdue University, West Lafayette, IN, USA
Venue:
Proceedings of the 13th international symposium on Low power electronics and design
Year:
2008

Citing 15
Cited 0

Hierarchical registers for scientific computers

ICS '88 Proceedings of the 2nd international conference on Supercomputing
The performance impact of incomplete bypassing in processor pipelines

Proceedings of the 28th annual international symposium on Microarchitecture
Complexity-effective superscalar processors

Proceedings of the 24th annual international symposium on Computer architecture
Wattch: a framework for architectural-level power analysis and optimizations

Proceedings of the 27th annual international symposium on Computer architecture
Two-level hierarchical register file organization for VLIW processors

Proceedings of the 33rd annual ACM/IEEE international symposium on Microarchitecture
Focusing processor policies via critical-path prediction

ISCA '01 Proceedings of the 28th annual international symposium on Computer architecture
The optimal logic depth per pipeline stage is 6 to 8 FO4 inverter delays

ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
SimpleScalar: An Infrastructure for Computer System Modeling

Computer
The MIPS R10000 Superscalar Microprocessor

IEEE Micro
The Alpha 21264 Microprocessor

IEEE Micro
Using SimPoint for accurate and efficient simulation

SIGMETRICS '03 Proceedings of the 2003 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Banked multiported register files for high-frequency superscalar microprocessors

Proceedings of the 30th annual international symposium on Computer architecture
Loose Loops Sink Chips

HPCA '02 Proceedings of the 8th International Symposium on High-Performance Computer Architecture
Instruction Replication: Reducing Delays Due to Inter-PE Communication Latency

Proceedings of the 12th International Conference on Parallel Architectures and Compilation Techniques
A case study: power and performance improvement of a chip multiprocessor for transaction processing

IEEE Transactions on Very Large Scale Integration (VLSI) Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

Researchers have proposed clustered microarchitectures for performance and energy effciency. Typically, clustered microarchitectures offer fast, local bypassing between instructions within clusters but global bypasses are slower. Traditional clustered microarchitectures (TCM) are implemented by partitioning the register file and associated functional units to clusters. This paper demonstrates an alternate implementation - Incomplete bypass-based clustered microarchitecture (IBCM). IBCM reduces the length of bypass wires by 42.4% resulting in an 8.9% reduction of "Execute" stage delay. This delay reduction in the critical EX stage enables voltage scaling that results in significantly lower average power consumption (between 11.7% and 19.5% lower) while achieving identical performance.