Improving performance and reducing energy-delay with adaptive resource resizing for out-of-order embedded processors

Authors:
Houman Homayoun;Sudeep Pasricha;Mohammad Makhzan;Alex Veidenbaum
Affiliations:
University of California Irvine, Irvine, CA, USA;University of California Irvine, Irvine, CA, USA;University of California Irvine, Irvine, CA, USA;University of California Irvine, Irvine, CA, USA
Venue:
Proceedings of the 2008 ACM SIGPLAN-SIGBED conference on Languages, compilers, and tools for embedded systems
Year:
2008

Citing 16
Cited 3

Complexity-effective superscalar processors

Proceedings of the 24th annual international symposium on Computer architecture
The multicluster architecture: reducing cycle time through partitioning

MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Wattch: a framework for architectural-level power analysis and optimizations

Proceedings of the 27th annual international symposium on Computer architecture
Multiple-banked register file architectures

Proceedings of the 27th annual international symposium on Computer architecture
A circuit level implementation of an adaptive issue queue for power-aware microprocessors

GLSVLSI '01 Proceedings of the 11th Great Lakes symposium on VLSI
Energy-effective issue logic

ISCA '01 Proceedings of the 28th annual international symposium on Computer architecture
Low-complexity reorder buffer architecture

ICS '02 Proceedings of the 16th international conference on Supercomputing
A scalable instruction queue design using dependence chains

ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
Reducing the complexity of the register file in dynamic superscalar processors

Proceedings of the 34th annual ACM/IEEE international symposium on Microarchitecture
Register write specialization register read specialization: a path to complexity-effective wide-issue superscalar processors

Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture
Banked multiported register files for high-frequency superscalar microprocessors

Proceedings of the 30th annual international symposium on Computer architecture
Leakage Energy Reduction in Register Renaming

ICDCSW '04 Proceedings of the 24th International Conference on Distributed Computing Systems Workshops - W7: EC (ICDCSW'04) - Volume 7
Increasing Processor Performance Through Early Register Release

ICCD '04 Proceedings of the IEEE International Conference on Computer Design
Evaluation of Speed and Area of Clustered VLIW Processors

VLSID '05 Proceedings of the 18th International Conference on VLSI Design held jointly with 4th International Conference on Embedded Systems Design
Dynamic Resizing of Superscalar Datapath Components for Energy Efficiency

IEEE Transactions on Computers
Efficient design space exploration of high performance embedded out-of-order processors

Proceedings of the conference on Design, automation and test in Europe: Proceedings

Multiple sleep modes leakage control in peripheral circuits of a all major SRAM-based processor units

Proceedings of the 7th ACM international conference on Computing frontiers
RELOCATE: register file local access pattern redistribution mechanism for power and thermal management in out-of-order embedded processor

HiPEAC'10 Proceedings of the 5th international conference on High Performance Embedded Architectures and Compilers
A HW/SW co-designed heterogeneous multi-core virtual machine for energy-efficient general purpose computing

CGO '11 Proceedings of the 9th Annual IEEE/ACM International Symposium on Code Generation and Optimization

Quantified Score

Hi-index	0.00

Visualization

Abstract

While Ultra Deep Submicron (UDSM) CMOS scaling gives embedded processor designers ample silicon budget to increase processor resources to improve performance, restrictions with the power budget and practically achievable operating clock frequencies act as limiting factors. In this paper we show how just increasing processor resource size is not effective in improving performance due to constraints on achievable operating clock frequency. In response we propose two adaptive resource resizing techniques L2RS and L2ML1RS that adaptively resize resources by exploiting cache misses. Our results show a significant performance improvement and overall energy-delay reduction of on average 9.2% (upto 34%) and 3.8% respectively across SPEC2K benchmarks for L2ML1RS. Applying L2RS resulted in 6.8% performance improvement (upto 24%) and 4.6% energy-delay reduction. We also present the required circuit modification to apply these techniques which shown to be minimal.