Load elimination for low-power embedded processors

Authors:
Gokhan Memik;Mahmut T. Kandemir;Arindam Mallik
Affiliations:
Northwestern University, Evanston, IL;Pennsylvania State University, PA;Northwestern University, Evanston, IL
Venue:
GLSVLSI '05 Proceedings of the 15th ACM Great Lakes symposium on VLSI
Year:
2005

Citing 10
Cited 0

Low power data processing by elimination of redundant computations

ISLPED '97 Proceedings of the 1997 international symposium on Low power electronics and design
Improving the accuracy and performance of memory communication through renaming

MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
MediaBench: a tool for evaluating and synthesizing multimedia and communicatons systems

MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
The SimpleScalar tool set, version 2.0

ACM SIGARCH Computer Architecture News
Decoupling local variable accesses in a wide-issue superscalar processor

ISCA '99 Proceedings of the 26th annual international symposium on Computer architecture
Early load address resolution via register tracking

Proceedings of the 27th annual international symposium on Computer architecture
Energy-efficient load and store reuse

ISLPED '01 Proceedings of the 2001 international symposium on Low power electronics and design
Precise Data Locality Optimization of Nested Loops

The Journal of Supercomputing
The Alpha 21264 Microprocessor

IEEE Micro
Hybrid Load-Value Predictors

IEEE Transactions on Computers

Quantified Score

Hi-index	0.00

Visualization

Abstract

The performance requirements of emerging embedded applications are rapidly increasing. One attractive approach to increase the performance of processors, while keeping their energy consumption low, is to utilize instruction-level parallelism. Hence, we are witnessing a significant increase in the number of superscalar embedded processors. In this paper, we present a method to reduce the energy consumption in such processors. Particularly, we will show that a) the load instructions in representative applications exhibit a large address locality, i.e., two consecutive executions of the same load instruction is very likely to access the same data, and b) the register file utilization of these applications are usually low. To take advantage of these observations, we devise a load elimination scheme, which tries to store the data values of load instructions in the register file. Our results with 11 MediaBench applications reveal that this method eliminates 20.5% of all cache accesses, resulting in 11.5% reduction in the energy consumption.