Low power data processing by elimination of redundant computations
ISLPED '97 Proceedings of the 1997 international symposium on Low power electronics and design
Improving the accuracy and performance of memory communication through renaming
MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
MediaBench: a tool for evaluating and synthesizing multimedia and communicatons systems
MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
The SimpleScalar tool set, version 2.0
ACM SIGARCH Computer Architecture News
Decoupling local variable accesses in a wide-issue superscalar processor
ISCA '99 Proceedings of the 26th annual international symposium on Computer architecture
Early load address resolution via register tracking
Proceedings of the 27th annual international symposium on Computer architecture
Energy-efficient load and store reuse
ISLPED '01 Proceedings of the 2001 international symposium on Low power electronics and design
Precise Data Locality Optimization of Nested Loops
The Journal of Supercomputing
The Alpha 21264 Microprocessor
IEEE Micro
IEEE Transactions on Computers
Hi-index | 0.00 |
The performance requirements of emerging embedded applications are rapidly increasing. One attractive approach to increase the performance of processors, while keeping their energy consumption low, is to utilize instruction-level parallelism. Hence, we are witnessing a significant increase in the number of superscalar embedded processors. In this paper, we present a method to reduce the energy consumption in such processors. Particularly, we will show that a) the load instructions in representative applications exhibit a large address locality, i.e., two consecutive executions of the same load instruction is very likely to access the same data, and b) the register file utilization of these applications are usually low. To take advantage of these observations, we devise a load elimination scheme, which tries to store the data values of load instructions in the register file. Our results with 11 MediaBench applications reveal that this method eliminates 20.5% of all cache accesses, resulting in 11.5% reduction in the energy consumption.