Trifecta: a nonspeculative scheme to exploit common, data-dependent subcritical paths

Authors:
Patrick Ndai;Nauman Rafique;Mithuna Thottethodi;Swaroop Ghosh;Swarup Bhunia;Kaushik Roy
Affiliations:
Electrical and Computer Engineering Department, Purdue University, West Lafayette, IN;Google, San Francisco, CA and Electrical and Computer Engineering Department, Purdue University, West Lafayette, IN;Electrical and Computer Engineering Department, Purdue University, West Lafayette, IN;Electrical and Computer Engineering Department, Purdue University, West Lafayette, IN;Electrical and Computer Engineering Department, Case Western Reserve University, Cleveland, OH;Electrical and Computer Engineering Department, Purdue University, West Lafayette, IN
Venue:
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Year:
2010

Citing 16
Cited 4

An investigation of the performance of various dynamic scheduling techniques

MICRO 25 Proceedings of the 25th annual international symposium on Microarchitecture
Complexity-effective superscalar processors

Proceedings of the 24th annual international symposium on Computer architecture
Performance improvement with circuit-level speculation

Proceedings of the 33rd annual ACM/IEEE international symposium on Microarchitecture
SimpleScalar: An Infrastructure for Computer System Modeling

Computer
Parameter variations and impact on circuits and microarchitecture

Proceedings of the 40th annual Design Automation Conference
Loose Loops Sink Chips

HPCA '02 Proceedings of the 8th International Symposium on High-Performance Computer Architecture
Picking Statistically Valid and Early Simulation Points

Proceedings of the 12th International Conference on Parallel Architectures and Compilation Techniques
Exploiting Microarchitectural Redundancy For Defect Tolerance

ICCD '03 Proceedings of the 21st International Conference on Computer Design
Razor: A Low-Power Pipeline Based on Circuit-Level Timing Speculation

Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture
Variation-tolerant circuits: circuit solutions and techniques

Proceedings of the 42nd annual Design Automation Conference
Rescue: A Microarchitecture for Testability and Defect Tolerance

Proceedings of the 32nd annual international symposium on Computer Architecture
Cascaded carry-select adder (C2SA): a new structure for low-power CSA design

ISLPED '05 Proceedings of the 2005 international symposium on Low power electronics and design
Mitigating the Impact of Process Variations on Processor Register Files and Execution Units

Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
A new paradigm for low-power, variation-tolerant circuit synthesis using critical path isolation

Proceedings of the 2006 IEEE/ACM international conference on Computer-aided design
Tolerance to Small Delay Defects by Adaptive Clock Stretching

IOLTS '07 Proceedings of the 13th IEEE International On-Line Testing Symposium
Comparison of high-performance VLSI adders in the energy-delay space

IEEE Transactions on Very Large Scale Integration (VLSI) Systems

Device/circuit interactions at 22nm technology node

Proceedings of the 46th Annual Design Automation Conference
Coupling latency-insensitivity with variable-latency for better than worst case design: a RISC case study

Proceedings of the 21st edition of the great lakes symposium on Great lakes symposium on VLSI
An architecture-level approach for mitigating the impact of process variations on extensible processors

DATE '12 Proceedings of the Conference on Design, Automation and Test in Europe
Considering the effect of process variations during the ISA extension design flow

Microprocessors & Microsystems

Quantified Score

Hi-index	0.00

Visualization

Abstract

Pipelined processor cores are conventionally designed to accommodate the critical paths in the critical pipeline stage(s) in a single clock cycle, to ensure correctness. Such conservative design is wasteful in many cases since critical paths are rarely exercised. Thus, configuring the pipeline to operate correctly for rarely used critical paths targets the uncommon case instead of optimizing for the common case. In this study, we describe Trifecta--an architectural technique that completes common-case, subcritical path operations in a single cycle but uses two cycles when the critical path is exercised. This increases slack for both single- and two-cycle operations and offers a unique advantage under process variation. In contrast with existing mechanisms that trade power or performance for yield, Trifecta improves the yield while preserving performance and power. We applied this technique to the critical pipeline stages of a superscalar out-of-order (OoO) and a single issue in-order processor, namely instruction issue and execute, respectively. Our experiments show that the rare two-cycle operations result in a small decrease (5% for integer and 2% for floating-point benchmarks of SPEC2000) in instructions per cycle. However, the increased delay slack causes an improvement in yield-adjusted-throughput by 20% (12.7%) for an in-order (InO) processor configuration.