Reducing data cache energy consumption via cached load/store queue

Authors:
Dan Nicolaescu;Alex Veidenbaum;Alex Nicolau
Affiliations:
University of California, Irvine, CA;University of California, Irvine, CA;University of California, Irvine, CA
Venue:
Proceedings of the 2003 international symposium on Low power electronics and design
Year:
2003

Citing 8
Cited 10

Zero-cycle loads: microarchitecture support for reducing load latency

Proceedings of the 28th annual international symposium on Microarchitecture
The filter cache: an energy efficient memory structure

MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Load-reuse analysis: design and evaluation

Proceedings of the ACM SIGPLAN 1999 conference on Programming language design and implementation
Way-predicting set-associative cache for high performance and low energy consumption

ISLPED '99 Proceedings of the 1999 international symposium on Low power electronics and design
Wattch: a framework for architectural-level power analysis and optimizations

Proceedings of the 27th annual international symposium on Computer architecture
Energy-efficient load and store reuse

ISLPED '01 Proceedings of the 2001 international symposium on Low power electronics and design
The Alpha 21264 Microprocessor

IEEE Micro
Reducing Power Consumption for High-Associativity Data Caches in Embedded Processors

DATE '03 Proceedings of the conference on Design, Automation and Test in Europe - Volume 1

Snug set-associative caches: reducing leakage power while improving performance

ISLPED '05 Proceedings of the 2005 international symposium on Low power electronics and design
Reducing cache traffic and energy with macro data load

Proceedings of the 2006 international symposium on Low power electronics and design
Snug set-associative caches: Reducing leakage power of instruction and data caches with no performance penalties

ACM Transactions on Architecture and Code Optimization (TACO)
Word-interleaved cache: an energy efficient data cache architecture

Proceedings of the 13th international symposium on Low power electronics and design
Zero loads: canceling load requests by tracking zero values

Proceedings of the 9th workshop on MEmory performance: DEaling with Applications, systems and architecture
On reducing load/store latencies of cache accesses

Journal of Systems Architecture: the EUROMICRO Journal
Using a way cache to improve performance of set-associative caches

ISHPC'05/ALPS'06 Proceedings of the 6th international symposium on high-performance computing and 1st international conference on Advanced low power systems
SAMIE-LSQ: set-associative multiple-instruction entry load/store queue

IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
L1 data cache power reduction using a forwarding predictor

PATMOS'10 Proceedings of the 20th international conference on Integrated circuit and system design: power and timing modeling, optimization and simulation
Energy-Effective instruction fetch unit for wide issue processors

ACSAC'05 Proceedings of the 10th Asia-Pacific conference on Advances in Computer Systems Architecture

Quantified Score

Hi-index	0.00

Visualization

Abstract

High-performance processors use a large set--associative L1 data cache with multiple ports. As clock speeds and size increase such a cache consumes a significant percentage of the total processor energy. This paper proposes a method of saving energy by reducing the number of data cache accesses. It does so by modifying the Load/Store Queue design to allow "caching" of previously accessed data values on both loads and stores after the corresponding memory access instruction has been committed. It is shown that a 32-entry modified LSQ design allows an average of 38.5% of the loads in the SpecINT95 benchmarks and 18.9% in the SpecFP95 benchmarks to get their data from the LSQ. The reduction in the number of L1 cache accesses results in up to a 40% reduction in the L1 data cache energy consumption and in an up to a 16% improvement in the energy--delay product while requiring almost no additional hardware or complex control logic.