Design and evaluation of a compiler algorithm for prefetching
ASPLOS V Proceedings of the fifth international conference on Architectural support for programming languages and operating systems
Power considerations in the design of the Alpha 21264 microprocessor
DAC '98 Proceedings of the 35th annual Design Automation Conference
Design issues for dynamic voltage scaling
ISLPED '00 Proceedings of the 2000 international symposium on Low power electronics and design
ACM Computing Surveys (CSUR)
Compiler-directed scratch pad memory hierarchy design and management
Proceedings of the 39th annual Design Automation Conference
Profile-Based Dynamic Voltage Scheduling Using Program Checkpoints
Proceedings of the conference on Design, automation and test in Europe
Data Reuse Analysis Technique for Software-Controlled Memory Hierarchies
Proceedings of the conference on Design, automation and test in Europe - Volume 1
Layer Assignment echniques for Low Energy in Multi-Layered Memory Organisations
DATE '03 Proceedings of the conference on Design, Automation and Test in Europe - Volume 1
Aggregating processor free time for energy reduction
CODES+ISSS '05 Proceedings of the 3rd IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis
Compilation techniques for energy reduction in horizontally partitioned cache architectures
Proceedings of the 2005 international conference on Compilers, architectures and synthesis for embedded systems
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
Improving performance of nested loops on reconfigurable array processors
ACM Transactions on Architecture and Code Optimization (TACO) - HIPEAC Papers
Hi-index | 0.00 |
Processor Idle Cycle Aggregation (PICA) is a promising approach for low power execution of processors, in which small memory stalls are aggregated to create a large one, and the processor is switched to low-power mode in it. We extend the previous proposed approach in two dimensions. i) We develop static analysis for the PICA technique and present optimum parameters for five common types of loops based on steady-state analysis. ii) We show that software only control is unable to guarantee its correctness in a varying runtime environment, potentially causing deadlocks. We enhance the robustness of PICA with minimal hardware extension, ensuring correct execution for any loops and parameters, which greatly facilitates exploration based parameter optimization. The combined use of our static analysis and exploration based fine-tuning makes the PICA technique applicable, to any memory-bound loop, with energy reduction. We validate our analytical models against simulation based optimization and also show through our experiments on embedded application benchmarks, that our technique can be applied to a wide range of loops with average 20% energy reductions compared to executions without PICA.