Cache design trade-offs for power and performance optimization: a case study
ISLPED '95 Proceedings of the 1995 international symposium on Low power design
Streamlining data cache access with fast address calculation
ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
The filter cache: an energy efficient memory structure
MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Instruction buffering to reduce power in processors for signal processing
IEEE Transactions on Very Large Scale Integration (VLSI) Systems - Special issue on low power electronics and design
Way-predicting set-associative cache for high performance and low energy consumption
ISLPED '99 Proceedings of the 1999 international symposium on Low power electronics and design
Selective cache ways: on-demand cache resource allocation
Proceedings of the 32nd annual ACM/IEEE international symposium on Microarchitecture
Architectural and compiler techniques for energy reduction in high-performance microprocessors
IEEE Transactions on Very Large Scale Integration (VLSI) Systems - Special section on low-power electronics and design
On-chip vs. off-chip memory: the data partitioning problem in embedded processor-based systems
ACM Transactions on Design Automation of Electronic Systems (TODAES)
An optimal memory allocation scheme for scratch-pad-based embedded systems
ACM Transactions on Embedded Computing Systems (TECS)
Stack Value File: Custom Microarchitecture for the Stack
HPCA '01 Proceedings of the 7th International Symposium on High-Performance Computer Architecture
Design of a Predictive Filter Cache for Energy Savings in High Performance Processor Architectures
ICCD '01 Proceedings of the International Conference on Computer Design: VLSI in Computers & Processors
Performance evaluation of cache replacement policies for the SPEC CPU2000 benchmark suite
ACM-SE 42 Proceedings of the 42nd annual Southeast regional conference
MiBench: A free, commercially representative embedded benchmark suite
WWC '01 Proceedings of the Workload Characterization, 2001. WWC-4. 2001 IEEE International Workshop
Guaranteeing Hits to Improve the Efficiency of a Small Instruction Cache
Proceedings of the 40th Annual IEEE/ACM International Symposium on Microarchitecture
ISCA '08 Proceedings of the 35th Annual International Symposium on Computer Architecture
Computer
Understanding sources of inefficiency in general-purpose chips
Proceedings of the 37th annual international symposium on Computer architecture
L1 data cache power reduction using a forwarding predictor
PATMOS'10 Proceedings of the 20th international conference on Integrated circuit and system design: power and timing modeling, optimization and simulation
Macro Data Load: An Efficient Mechanism for Enhancing Loaded Data Reuse
IEEE Transactions on Computers
Revisiting level-0 caches in embedded processors
Proceedings of the 2012 international conference on Compilers, architectures and synthesis for embedded systems
Designing a practical data filter cache to improve both energy efficiency and performance
ACM Transactions on Architecture and Code Optimization (TACO)
Hi-index | 0.00 |
As CPU data requests to the level-one (L1) data cache (DC) can represent as much as 25% of an embedded processor's total power dissipation, techniques that decrease L1 DC accesses can significantly enhance processor energy efficiency. Filter caches are known to efficiently decrease the number of accesses to instruction caches. However, due to the irregular access pattern of data accesses, a conventional data filter cache (DFC) has a high miss rate, which degrades processor performance. We propose to integrate a DFC with a fast address calculation technique to significantly reduce the impact of misses and to improve performance by enabling one-cycle loads. Furthermore, we show that DFC stalls can be eliminated even after unsuccessful fast address calculations, by simultaneously accessing the DFC and L1 DC on the following cycle. We quantitatively evaluate different DFC configurations, with and without the fast address calculation technique, using different write allocation policies, and qualitatively describe their impact on energy efficiency. The proposed design provides an efficient DFC that yields both energy and performance improvements.