Complexity-effective superscalar processors
Proceedings of the 24th annual international symposium on Computer architecture
The multicluster architecture: reducing cycle time through partitioning
MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Wattch: a framework for architectural-level power analysis and optimizations
Proceedings of the 27th annual international symposium on Computer architecture
Multiple-banked register file architectures
Proceedings of the 27th annual international symposium on Computer architecture
A circuit level implementation of an adaptive issue queue for power-aware microprocessors
GLSVLSI '01 Proceedings of the 11th Great Lakes symposium on VLSI
ISCA '01 Proceedings of the 28th annual international symposium on Computer architecture
Low-complexity reorder buffer architecture
ICS '02 Proceedings of the 16th international conference on Supercomputing
A scalable instruction queue design using dependence chains
ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
Reducing the complexity of the register file in dynamic superscalar processors
Proceedings of the 34th annual ACM/IEEE international symposium on Microarchitecture
Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture
Banked multiported register files for high-frequency superscalar microprocessors
Proceedings of the 30th annual international symposium on Computer architecture
Leakage Energy Reduction in Register Renaming
ICDCSW '04 Proceedings of the 24th International Conference on Distributed Computing Systems Workshops - W7: EC (ICDCSW'04) - Volume 7
Increasing Processor Performance Through Early Register Release
ICCD '04 Proceedings of the IEEE International Conference on Computer Design
Evaluation of Speed and Area of Clustered VLIW Processors
VLSID '05 Proceedings of the 18th International Conference on VLSI Design held jointly with 4th International Conference on Embedded Systems Design
Dynamic Resizing of Superscalar Datapath Components for Energy Efficiency
IEEE Transactions on Computers
Efficient design space exploration of high performance embedded out-of-order processors
Proceedings of the conference on Design, automation and test in Europe: Proceedings
Proceedings of the 7th ACM international conference on Computing frontiers
HiPEAC'10 Proceedings of the 5th international conference on High Performance Embedded Architectures and Compilers
CGO '11 Proceedings of the 9th Annual IEEE/ACM International Symposium on Code Generation and Optimization
Hi-index | 0.00 |
While Ultra Deep Submicron (UDSM) CMOS scaling gives embedded processor designers ample silicon budget to increase processor resources to improve performance, restrictions with the power budget and practically achievable operating clock frequencies act as limiting factors. In this paper we show how just increasing processor resource size is not effective in improving performance due to constraints on achievable operating clock frequency. In response we propose two adaptive resource resizing techniques L2RS and L2ML1RS that adaptively resize resources by exploiting cache misses. Our results show a significant performance improvement and overall energy-delay reduction of on average 9.2% (upto 34%) and 3.8% respectively across SPEC2K benchmarks for L2ML1RS. Applying L2RS resulted in 6.8% performance improvement (upto 24%) and 4.6% energy-delay reduction. We also present the required circuit modification to apply these techniques which shown to be minimal.