Towards a performance- and energy-efficient data filter cache

Authors:
Alen Bardizbanyan;Magnus Själander;David Whalley;Per Larsson-Edefors
Affiliations:
Chalmers University of Technology;Florida State University;Florida State University;Chalmers University of Technology
Venue:
Proceedings of the 10th Workshop on Optimizations for DSP and Embedded Systems
Year:
2013

Citing 21
Cited 1

Cache design trade-offs for power and performance optimization: a case study

ISLPED '95 Proceedings of the 1995 international symposium on Low power design
Streamlining data cache access with fast address calculation

ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
The filter cache: an energy efficient memory structure

MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Instruction buffering to reduce power in processors for signal processing

IEEE Transactions on Very Large Scale Integration (VLSI) Systems - Special issue on low power electronics and design
Way-predicting set-associative cache for high performance and low energy consumption

ISLPED '99 Proceedings of the 1999 international symposium on Low power electronics and design
Selective cache ways: on-demand cache resource allocation

Proceedings of the 32nd annual ACM/IEEE international symposium on Microarchitecture
Architectural and compiler techniques for energy reduction in high-performance microprocessors

IEEE Transactions on Very Large Scale Integration (VLSI) Systems - Special section on low-power electronics and design
On-chip vs. off-chip memory: the data partitioning problem in embedded processor-based systems

ACM Transactions on Design Automation of Electronic Systems (TODAES)
An optimal memory allocation scheme for scratch-pad-based embedded systems

ACM Transactions on Embedded Computing Systems (TECS)
SimpleScalar: An Infrastructure for Computer System Modeling

Computer
Stack Value File: Custom Microarchitecture for the Stack

HPCA '01 Proceedings of the 7th International Symposium on High-Performance Computer Architecture
Design of a Predictive Filter Cache for Energy Savings in High Performance Processor Architectures

ICCD '01 Proceedings of the International Conference on Computer Design: VLSI in Computers & Processors
Performance evaluation of cache replacement policies for the SPEC CPU2000 benchmark suite

ACM-SE 42 Proceedings of the 42nd annual Southeast regional conference
MiBench: A free, commercially representative embedded benchmark suite

WWC '01 Proceedings of the Workload Characterization, 2001. WWC-4. 2001 IEEE International Workshop
Guaranteeing Hits to Improve the Efficiency of a Small Instruction Cache

Proceedings of the 40th Annual IEEE/ACM International Symposium on Microarchitecture
A Comprehensive Memory Modeling Tool and Its Application to the Design and Analysis of Future Memory Hierarchies

ISCA '08 Proceedings of the 35th Annual International Symposium on Computer Architecture
Efficient Embedded Computing

Computer
Understanding sources of inefficiency in general-purpose chips

Proceedings of the 37th annual international symposium on Computer architecture
L1 data cache power reduction using a forwarding predictor

PATMOS'10 Proceedings of the 20th international conference on Integrated circuit and system design: power and timing modeling, optimization and simulation
Macro Data Load: An Efficient Mechanism for Enhancing Loaded Data Reuse

IEEE Transactions on Computers
Revisiting level-0 caches in embedded processors

Proceedings of the 2012 international conference on Compilers, architectures and synthesis for embedded systems

Designing a practical data filter cache to improve both energy efficiency and performance

ACM Transactions on Architecture and Code Optimization (TACO)

Quantified Score

Hi-index	0.00

Visualization

Abstract

As CPU data requests to the level-one (L1) data cache (DC) can represent as much as 25% of an embedded processor's total power dissipation, techniques that decrease L1 DC accesses can significantly enhance processor energy efficiency. Filter caches are known to efficiently decrease the number of accesses to instruction caches. However, due to the irregular access pattern of data accesses, a conventional data filter cache (DFC) has a high miss rate, which degrades processor performance. We propose to integrate a DFC with a fast address calculation technique to significantly reduce the impact of misses and to improve performance by enabling one-cycle loads. Furthermore, we show that DFC stalls can be eliminated even after unsuccessful fast address calculations, by simultaneously accessing the DFC and L1 DC on the following cycle. We quantitatively evaluate different DFC configurations, with and without the fast address calculation technique, using different write allocation policies, and qualitatively describe their impact on energy efficiency. The proposed design provides an efficient DFC that yields both energy and performance improvements.