MediaBench: a tool for evaluating and synthesizing multimedia and communicatons systems
MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Adapting cache line size to application behavior
ICS '99 Proceedings of the 13th international conference on Supercomputing
Selective cache ways: on-demand cache resource allocation
Proceedings of the 32nd annual ACM/IEEE international symposium on Microarchitecture
A low power unified cache architecture providing power and performance flexibility (poster session)
ISLPED '00 Proceedings of the 2000 international symposium on Low power electronics and design
Proceedings of the 33rd annual ACM/IEEE international symposium on Microarchitecture
Reducing set-associative cache energy via way-prediction and selective direct-mapping
Proceedings of the 34th annual ACM/IEEE international symposium on Microarchitecture
Energy Benefits of a Configurable Line Size Cache for Embedded Systems
ISVLSI '03 Proceedings of the IEEE Computer Society Annual Symposium on VLSI (ISVLSI'03)
A highly configurable cache architecture for embedded systems
Proceedings of the 30th annual international symposium on Computer architecture
Platune: a tuning framework for system-on-a-chip platforms
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
Tag Overflow Buffering: An Energy-Efficient Cache Architecture
Proceedings of the conference on Design, Automation and Test in Europe - Volume 1
A first look at the interplay of code reordering and configurable caches
GLSVLSI '05 Proceedings of the 15th ACM Great Lakes symposium on VLSI
Fast configurable-cache tuning with a unified second-level cache
ISLPED '05 Proceedings of the 2005 international symposium on Low power electronics and design
Cache size selection for performance, energy and reliability of time-constrained systems
ASP-DAC '06 Proceedings of the 2006 Asia and South Pacific Design Automation Conference
Online hardware/software partitioning in networked embedded systems
Proceedings of the 2005 Asia and South Pacific Design Automation Conference
Configurable cache subsetting for fast cache tuning
Proceedings of the 43rd annual Design Automation Conference
Thread-associative memory for multicore and multithreaded computing
Proceedings of the 2006 international symposium on Low power electronics and design
The effect of temperature on cache size tuning for low energy embedded systems
Proceedings of the 17th ACM Great Lakes symposium on VLSI
Reducing cache energy consumption by tag encoding in embedded processors
ISLPED '07 Proceedings of the 2007 international symposium on Low power electronics and design
A table-based method for single-pass cache optimization
Proceedings of the 18th ACM Great Lakes symposium on VLSI
Tag overflow buffering: reducing total memory energy by reduced-tag matching
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Fast configurable-cache tuning with a unified second-level cache
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Dynamic Cache Reconfiguration for Soft Real-Time Systems
ACM Transactions on Embedded Computing Systems (TECS)
Combining code reordering and cache configuration
ACM Transactions on Embedded Computing Systems (TECS)
Hi-index | 0.00 |
Memory accesses can account for about half of a microprocessor system's power consumption. Customizing a microprocessor cache's total size, line size and associativity to a particular program is well known to have tremendous benefits for performance and power. Customizing caches has until recently been restricted to core-based flows, in which a new chip will be fabricated. However, several configurable cache architectures have been proposed recently for use in pre-fabricated microprocessor platforms. Tuning those caches to a program is still however a cumbersome task left for designers, assisted in part by recent computer-aided design (CAD) tuning aids. We propose to move that CAD on-chip, which can greatly increase the acceptance of configurable caches. We introduce on-chip hardware implementing an efficient cache tuning heuristic that can automatically, transparently, and dynamically tune the cache to an executing program. We carefully designed the heuristic to avoid any cache flushing, since flushing is power and performance costly. By simulating numerous Powerstone and MediaBench benchmarks, we show that such a dynamic self-tuning cache can reduce memory-access energy by 45% to 55% on average, and as much as 97%, compared with a four-way set-associative base cache, completely transparently to the programmer.