Complexity-effective superscalar processors
Proceedings of the 24th annual international symposium on Computer architecture
Parameter variations and impact on circuits and microarchitecture
Proceedings of the 40th annual Design Automation Conference
Dynamically Exploiting Narrow Width Operands to Improve Processor Power and Performance
HPCA '99 Proceedings of the 5th International Symposium on High Performance Computer Architecture
Razor: A Low-Power Pipeline Based on Circuit-Level Timing Speculation
Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture
Complexity-Effective Reorder Buffer Designs for Superscalar Processors
IEEE Transactions on Computers
Variation-tolerant circuits: circuit solutions and techniques
Proceedings of the 42nd annual Design Automation Conference
Within-Die Variation-Aware Scheduling in Superscalar Processors for Improved Throughput
IEEE Transactions on Computers
Profit aware circuit design under process variations considering speed binning
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
DCG: deterministic clock-gating for low-power microprocessor design
IEEE Transactions on Very Large Scale Integration (VLSI) Systems - Special section on the 2002 international symposium on low-power electronics and design (ISLPED)
Telescopic units: a new paradigm for performance optimization of VLSI designs
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
Hi-index | 0.00 |
With increasing parameter variations, functional units (FUs) in a chip experience considerable local variations in maximum operating frequency. Effect of such within-die variations in a superscalar processor if addressed by worst-case frequency assignment, results in overly pessimistic yield in high-frequency bins. In this paper, we propose VAIL - a novel low-overhead instruction scheduling strategy that assigns best-case frequency by issuing the narrow-width (NW) operations to slower units. This exploits the abundance of NW operations (70%) in a typical program and the fact that the critical path in FUs are not activated for these operations. Compared to existing vari-cycle approach, the proposed scheme demonstrates a large improvement in yield (~ 27% at highest performance bin) and profit (10-15%) for a set of benchmark applications. It also improves the thermal profile for the FUs. Finally, it provides large opportunistic power saving (~ 43%) in the slow units using supply gating of inactive bit-slices.