Simultaneous multithreading: maximizing on-chip parallelism
ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Power considerations in the design of the Alpha 21264 microprocessor
DAC '98 Proceedings of the 35th annual Design Automation Conference
A survey of design techniques for system-level dynamic power management
IEEE Transactions on Very Large Scale Integration (VLSI) Systems - Special section on low-power electronics and design
ACM Computing Surveys (CSUR)
Power Aware Design Methodologies
Power Aware Design Methodologies
XTREM: a power simulator for the Intel XScale® core
Proceedings of the 2004 ACM SIGPLAN/SIGBED conference on Languages, compilers, and tools for embedded systems
MiBench: A free, commercially representative embedded benchmark suite
WWC '01 Proceedings of the Workload Characterization, 2001. WWC-4. 2001 IEEE International Workshop
Static analysis of processor stall cycle aggregation
CODES+ISSS '08 Proceedings of the 6th IEEE/ACM/IFIP international conference on Hardware/Software codesign and system synthesis
Processor Description Languages
Processor Description Languages
Adaptive power management for real-time event streams
Proceedings of the 2010 Asia and South Pacific Design Automation Conference
Hybrid dynamic energy and thermal management in heterogeneous embedded multiprocessor SoCs
Proceedings of the 2010 Asia and South Pacific Design Automation Conference
CODES+ISSS '11 Proceedings of the seventh IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis
PICA: Processor Idle Cycle Aggregation for Energy-Efficient Embedded Systems
ACM Transactions on Embedded Computing Systems (TECS)
Hi-index | 0.00 |
Even after carefully tuning the memory characteristics to the application properties and the processor speed, during the execution of real applications there are times when the processor stalls, waiting for data from the memory. Processor stall can be used to increase the throughput by temporarily switching to a different thread of execution, or reduce the power and energy consumption by temporarily switching the processor to low-power mode. However, any such technique has a performance overhead in terms of switching time. Even though over the execution of an application the processor is stalled for a considerable amount of time, each stall duration is too small to profitably perform any state switch. In this paper, we present code transformations to aggregate processor free time. Our experiments on the Intel XScale and Stream kernels show that up to 50,000 processor cycles can be aggregated, and used to profitably switch the processor to low-power mode. We further show that our code transformations can switch the processor to low-power mode for up to 75% of kernel runtime, achieving up to 18% of processor energy savings on multimedia applications. Our technique requires minimal architectural modifications and incurs negligible (