Column-associative caches: a technique for reducing the miss rate of direct-mapped caches
ISCA '93 Proceedings of the 20th annual international symposium on computer architecture
Efficient simulation of caches under optimal replacement with applications to miss characterization
SIGMETRICS '93 Proceedings of the 1993 ACM SIGMETRICS conference on Measurement and modeling of computer systems
A data cache with multiple caching strategies tuned to different types of locality
ICS '95 Proceedings of the 9th international conference on Supercomputing
Memory bandwidth limitations of future microprocessors
ISCA '96 Proceedings of the 23rd annual international symposium on Computer architecture
Eliminating cache conflict misses through XOR-based placement functions
ICS '97 Proceedings of the 11th international conference on Supercomputing
Run-time spatial locality detection and optimization
MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
MediaBench: a tool for evaluating and synthesizing multimedia and communicatons systems
MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
The pool of subsectors cache design
ICS '99 Proceedings of the 13th international conference on Supercomputing
Adapting cache line size to application behavior
ICS '99 Proceedings of the 13th international conference on Supercomputing
Way-predicting set-associative cache for high performance and low energy consumption
ISLPED '99 Proceedings of the 1999 international symposium on Low power electronics and design
ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture
Filtering Memory References to Increase Energy Efficiency
IEEE Transactions on Computers
Journal of VLSI Signal Processing Systems - Special issue on system level design
Region-based caching: an energy-delay efficient memory architecture for embedded processors
CASES '00 Proceedings of the 2000 international conference on Compilers, architecture, and synthesis for embedded systems
Energy-efficient design of battery-powered embedded systems
IEEE Transactions on Very Large Scale Integration (VLSI) Systems - Special issue on low power electronics and design
L1 data cache decomposition for energy efficiency
ISLPED '01 Proceedings of the 2001 international symposium on Low power electronics and design
Design space optimization of embedded memory systems via data remapping
Proceedings of the joint conference on Languages, compilers and tools for embedded systems: software and compilers for embedded systems
Memory Issues in Embedded Systems-on-Chip: Optimizations and Exploration
Memory Issues in Embedded Systems-on-Chip: Optimizations and Exploration
Adaptive Mode Control: A Static-Power-Efficient Cache Design
Proceedings of the 2001 International Conference on Parallel Architectures and Compilation Techniques
Pursuing the Performance Potential of Dynamic Cache Line Sizes
ICCD '99 Proceedings of the 1999 IEEE International Conference on Computer Design
Reducing energy and delay using efficient victim caches
Proceedings of the 2003 international symposium on Low power electronics and design
HPCA '01 Proceedings of the 7th International Symposium on High-Performance Computer Architecture
DSD '03 Proceedings of the Euromicro Symposium on Digital Systems Design
Reducing traffic generated by conflict misses in caches
Proceedings of the 1st conference on Computing frontiers
MiBench: A free, commercially representative embedded benchmark suite
WWC '01 Proceedings of the Workload Characterization, 2001. WWC-4. 2001 IEEE International Workshop
Performance and power effectiveness in embedded processors customizable partitioned caches
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
Partial product reduction by using look-up tables for M×N multiplier
Integration, the VLSI Journal
Hi-index | 0.00 |
Memory transfers, in particular from/to off-chip memories, consume a significant amount of power. In order to reduce the amount of off-chip memory traffic, one or more levels of cache can be employed, located on the same die as the processor core. For performance, energy, and cost reasons, it is expedient that the on-chip cache is small and direct-mapped. Small, direct-mapped caches, however, generally produce much more traffic than needed. The purpose of this paper is two-fold. First, to measure how much traffic is generated by small, direct-mapped caches and what the minimal amount of traffic is. This yields an upper bound on the amount of traffic that can be saved by utilizing the on-chip memory more effectively. Second, we survey some techniques that can be deployed to reduce the amount of traffic produced by direct-mapped caches and present results for some of these techniques.