Revisiting level-0 caches in embedded processors

Authors:
Nam Duong;Taesu Kim;Dali Zhao;Alexander V. Veidenbaum
Affiliations:
University of California, Irvine, irvine, California, USA;Univerisity of California, Irvine, Irvine, California, USA;University of California, Irvine, Irvine, California, USA;University of Califorina, Irvine, Irvine, California, USA
Venue:
Proceedings of the 2012 international conference on Compilers, architectures and synthesis for embedded systems
Year:
2012

Citing 18
Cited 3

Cache design trade-offs for power and performance optimization: a case study

ISLPED '95 Proceedings of the 1995 international symposium on Low power design
The filter cache: an energy efficient memory structure

MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
MediaBench: a tool for evaluating and synthesizing multimedia and communicatons systems

MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Reducing power in superscalar processor caches using subbanking, multiple line buffers and bit-line segmentation

ISLPED '99 Proceedings of the 1999 international symposium on Low power electronics and design
Improving direct-mapped cache performance by the addition of a small fully-associative cache and prefetch buffers

ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture
Scratchpad memory: design alternative for cache on-chip memory in embedded systems

Proceedings of the tenth international symposium on Hardware/software codesign
Estimating cache misses and locality using stack distances

ICS '03 Proceedings of the 17th annual international conference on Supercomputing
HotSpot cache: joint temporal and spatial locality exploitation for i-cache energy reduction

Proceedings of the 2004 international symposium on Low power electronics and design
Reducing Power Consumption for High-Associativity Data Caches in Embedded Processors

DATE '03 Proceedings of the conference on Design, Automation and Test in Europe - Volume 1
Exploiting Fixed Programs in Embedded Systems: A Loop Cache Example

IEEE Computer Architecture Letters
MiBench: A free, commercially representative embedded benchmark suite

WWC '01 Proceedings of the Workload Characterization, 2001. WWC-4. 2001 IEEE International Workshop
CommBench-a telecommunications benchmark for network processors

ISPASS '00 Proceedings of the 2000 IEEE International Symposium on Performance Analysis of Systems and Software
Adaptive insertion policies for high performance caching

Proceedings of the 34th annual international symposium on Computer architecture
Guaranteeing Hits to Improve the Efficiency of a Small Instruction Cache

Proceedings of the 40th Annual IEEE/ACM International Symposium on Microarchitecture
Instruction-based reuse-distance prediction for effective cache management

SAMOS'09 Proceedings of the 9th international conference on Systems, architectures, modeling and simulation
High performance cache replacement using re-reference interval prediction (RRIP)

Proceedings of the 37th annual international symposium on Computer architecture
The gem5 simulator

ACM SIGARCH Computer Architecture News
A tagless cache design for power saving in embedded systems

The Journal of Supercomputing

Towards a performance- and energy-efficient data filter cache

Proceedings of the 10th Workshop on Optimizations for DSP and Embedded Systems
Data filter cache with word selection cache for low power embedded processor

Proceedings of the 2013 Research in Adaptive and Convergent Systems
Designing a practical data filter cache to improve both energy efficiency and performance

ACM Transactions on Architecture and Code Optimization (TACO)

Quantified Score

Hi-index	0.00

Visualization

Abstract

Level-0 (L0) caches have been proposed in the past as an inexpensive way to improve performance and reduce energy consumption in resource-constrained embedded processors. This paper proposes new L0 data cache organizations using the assumption that an L0 hit/miss determination can be completed prior to the L1 access. This is a realistic assumption for very small L0 caches that can nevertheless deliver significant miss rate and/or energy reduction. The key issue for such caches is how and when to move data between the L0 and L1 caches. The first new cache, a flow cache, targets a conflict miss reduction in a direct-mapped L1 cache. It offers a simpler hardware design and uses on average 10% less dynamic energy than the victim cache with nearly identical performance. The second new cache, a hit cache, reduces the dynamic energy consumption in a set-associative L1 cache by 30% without impacting performance. A variant of this policy reduces the dynamic energy consumption by up to 50%, with 5% performance degradation.