A Case for Direct-Mapped Caches
Computer
PLDI '91 Proceedings of the ACM SIGPLAN 1991 conference on Programming language design and implementation
An effective on-chip preloading scheme to reduce data access penalty
Proceedings of the 1991 ACM/IEEE conference on Supercomputing
A practical algorithm for exact array dependence analysis
Communications of the ACM
Design and evaluation of a compiler algorithm for prefetching
ASPLOS V Proceedings of the fifth international conference on Architectural support for programming languages and operating systems
Column-associative caches: a technique for reducing the miss rate of direct-mapped caches
ISCA '93 Proceedings of the 20th annual international symposium on computer architecture
Efficient simulation of caches under optimal replacement with applications to miss characterization
SIGMETRICS '93 Proceedings of the 1993 ACM SIGMETRICS conference on Measurement and modeling of computer systems
Evaluating stream buffers as a secondary cache replacement
ISCA '94 Proceedings of the 21st annual international symposium on Computer architecture
Memory bandwidth limitations of future microprocessors
ISCA '96 Proceedings of the 23rd annual international symposium on Computer architecture
Improving data locality with loop transformations
ACM Transactions on Programming Languages and Systems (TOPLAS)
Run-time spatial locality detection and optimization
MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
The SimpleScalar tool set, version 2.0
ACM SIGARCH Computer Architecture News
Utilizing reuse information in data cache management
ICS '98 Proceedings of the 12th international conference on Supercomputing
Precise miss analysis for program transformations with caches of arbitrary associativity
Proceedings of the eighth international conference on Architectural support for programming languages and operating systems
An Algorithm for Optimally Exploiting Spatial and Temporal Locality in Upper Memory Levels
IEEE Transactions on Computers - Special issue on cache memory and related problems
EELRU: simple and effective adaptive page replacement
SIGMETRICS '99 Proceedings of the 1999 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture
Quantifying loop nest locality using SPEC'95 and the perfect benchmarks
ACM Transactions on Computer Systems (TOCS)
Clock rate versus IPC: the end of the road for conventional microarchitectures
Proceedings of the 27th annual international symposium on Computer architecture
Dead-block prediction & dead-block correlating prefetchers
ISCA '01 Proceedings of the 28th annual international symposium on Computer architecture
The IA-64 Architecture at Work
Computer
An Implementation of Interprocedural Bounded Regular Section Analysis
IEEE Transactions on Parallel and Distributed Systems
Load Scheduling with Profile Information
Euro-Par '00 Proceedings from the 6th International Euro-Par Conference on Parallel Processing
A Matrix-Based Approach to the Global Locality Optimization Problem
PACT '98 Proceedings of the 1998 International Conference on Parallel Architectures and Compilation Techniques
Reducing DRAM Latencies with an Integrated Memory Hierarchy Design
HPCA '01 Proceedings of the 7th International Symposium on High-Performance Computer Architecture
Improving the performance of virtual memory computers.
Improving the performance of virtual memory computers.
A study of replacement algorithms for a virtual-storage computer
IBM Systems Journal
Performance evaluation of cache replacement policies for the SPEC CPU2000 benchmark suite
ACM-SE 42 Proceedings of the 42nd annual Southeast regional conference
Generating cache hints for improved program efficiency
Journal of Systems Architecture: the EUROMICRO Journal
Counter-Based Cache Replacement Algorithms
ICCD '05 Proceedings of the 2005 International Conference on Computer Design
International Journal of Parallel Programming
Investigating cache energy and latency break-even points in high performance processors
MEDEA '06 Proceedings of the 2006 workshop on MEmory performance: DEaling with Applications, systems and architectures
MEDEA '06 Proceedings of the 2006 workshop on MEmory performance: DEaling with Applications, systems and architectures
Proceedings of the 20th annual international conference on Supercomputing
Reducing Cache Pollution via Dynamic Data Prefetch Filtering
IEEE Transactions on Computers
Improving power efficiency with compiler-assisted cache replacement
Journal of Embedded Computing - Cache exploitation in embedded systems
Comparing memory systems for chip multiprocessors
Proceedings of the 34th annual international symposium on Computer architecture
Adaptive insertion policies for high performance caching
Proceedings of the 34th annual international symposium on Computer architecture
Investigating cache energy and latency break-even points in high performance processors
ACM SIGARCH Computer Architecture News
ACM SIGARCH Computer Architecture News
Comparative evaluation of memory models for chip multiprocessors
ACM Transactions on Architecture and Code Optimization (TACO)
P-OPT: Program-Directed Optimal Cache Management
Languages and Compilers for Parallel Computing
Cache bursts: A new approach for eliminating dead blocks and increasing cache efficiency
Proceedings of the 41st annual IEEE/ACM International Symposium on Microarchitecture
A concurrent dynamic analysis framework for multicore hardware
Proceedings of the 24th ACM SIGPLAN conference on Object oriented programming systems languages and applications
Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture
Using dead blocks as a virtual victim cache
Proceedings of the 19th international conference on Parallel architectures and compilation techniques
Reducing Cache Pollution Through Detection and Elimination of Non-Temporal Memory Accesses
Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis
Sampling Dead Block Prediction for Last-Level Caches
MICRO '43 Proceedings of the 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture
Minor memory references matter in collaborative caching
Proceedings of the 2011 ACM SIGPLAN Workshop on Memory Systems Performance and Correctness
On the theory and potential of LRU-MRU collaborative cache management
Proceedings of the international symposium on Memory management
Dynamic access distance driven cache replacement
ACM Transactions on Architecture and Code Optimization (TACO)
A study of the performance potential for dynamic instruction hints selection
ACSAC'06 Proceedings of the 11th Asia-Pacific conference on Advances in Computer Systems Architecture
Reducing off-chip memory traffic by selective cache management scheme in GPGPUs
Proceedings of the 5th Annual Workshop on General Purpose Processing with Graphics Processing Units
Link-time optimization for power efficiency in a tagless instruction cache
CGO '11 Proceedings of the 9th Annual IEEE/ACM International Symposium on Code Generation and Optimization
A generalized theory of collaborative caching
Proceedings of the 2012 international symposium on Memory Management
The evicted-address filter: a unified mechanism to address both cache pollution and thrashing
Proceedings of the 21st international conference on Parallel architectures and compilation techniques
Prefetching and cache management using task lifetimes
Proceedings of the 27th international ACM conference on International conference on supercomputing
Pacman: program-assisted cache management
Proceedings of the 2013 international symposium on memory management
Proceedings of the ACM SIGPLAN Workshop on Memory Systems Performance and Correctness
Location-aware cache management for many-core processors with deep cache hierarchy
SC '13 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
An efficient compiler framework for cache bypassing on GPUs
Proceedings of the International Conference on Computer-Aided Design
Hi-index | 0.00 |
Memory performance is increasingly determining microprocessor performance and technology trends are exacerbating this problem. Most architectures use set-associative caches with LRU replacement policies to combine fast access with relatively low miss rates. To improve replacement decisions in set-associative caches, we develop a new set of compiler algorithms that predict which data will and will not be reused and provide these hints to the architecture.We prove that the hints either match or improve hit rates over LRU. We describe a practical one-bit cache-line tag implementation of our algorithm, called evict-me. On a cache replacement, the architecture will replace a line for which the evict-me bit is set, or if none is set, it will use the LRU bits. We implement our compiler analysis and its output in the Scale compiler. On a variety of scientific programs, using the evict-me algorithm in both the level 1and 2 caches improves simulated cycle times by up to 34% over the LRU policy by increasing hit rates. In addition, a combination of simple hardware prefetching and evict-me works together to further improve performance.