Performance tradeoffs in cache design
ISCA '88 Proceedings of the 15th Annual International Symposium on Computer architecture
Program optimization for instruction caches
ASPLOS III Proceedings of the third international conference on Architectural support for programming languages and operating systems
Characteristics of performance-optimal multi-level cache hierarchies
ISCA '89 Proceedings of the 16th annual international symposium on Computer architecture
Achieving high instruction cache performance with an optimizing compiler
ISCA '89 Proceedings of the 16th annual international symposium on Computer architecture
Profile guided code positioning
PLDI '90 Proceedings of the ACM SIGPLAN 1990 conference on Programming language design and implementation
A data locality optimizing algorithm
PLDI '91 Proceedings of the ACM SIGPLAN 1991 conference on Programming language design and implementation
Program analysis and optimization for machines with instruction cache
Program analysis and optimization for machines with instruction cache
ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture
ACM Computing Surveys (CSUR)
Aspects of cache memory and instruction buffer performance
Aspects of cache memory and instruction buffer performance
Performance directed memory hierarchy design
Performance directed memory hierarchy design
Compile time instruction cache optimizations
ACM SIGARCH Computer Architecture News - Special issue: panel sessions of the 1991 workshop on multithreaded computers
A data cache with multiple caching strategies tuned to different types of locality
ICS '95 Proceedings of the 9th international conference on Supercomputing
A Software Approach to Avoiding Spatial Cache Collisions in Parallel Processor Systems
IEEE Transactions on Parallel and Distributed Systems
Investigating optimal local memory performance
Proceedings of the eighth international conference on Architectural support for programming languages and operating systems
Functional Implementation Techniques for CPU Cache Memories
IEEE Transactions on Computers - Special issue on cache memory and related problems
An Algorithm for Optimally Exploiting Spatial and Temporal Locality in Upper Memory Levels
IEEE Transactions on Computers - Special issue on cache memory and related problems
Selective Victim Caching: A Method to Improve the Performance of Direct-Mapped Caches
IEEE Transactions on Computers
MEDEA '06 Proceedings of the 2006 workshop on MEmory performance: DEaling with Applications, systems and architectures
Adaptive insertion policies for high performance caching
Proceedings of the 34th annual international symposium on Computer architecture
ACM SIGARCH Computer Architecture News
MICRO '43 Proceedings of the 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture
Dynamic access distance driven cache replacement
ACM Transactions on Architecture and Code Optimization (TACO)
Enhancing last-level cache performance by block bypassing and early miss determination
ACSAC'06 Proceedings of the 11th Asia-Pacific conference on Advances in Computer Systems Architecture
Hi-index | 0.00 |
Most recent cache designs use direct-mapped caches to provide the fast access time required by modern high speed CPU's. Unfortunately, direct-mapped caches have higher miss rates than set-associative caches, largely because direct-mapped caches are more sensitive to conflicts between items needed frequently in the same phase of program execution.This paper presents a new technique for reducing direct-mapped cache misses caused by conflicts for a particular cache line. A small finite state machine recognizes the common instruction reference patterns where storing an instruction in the cache actually harms performance. Such instructions are dynamically excluded, that is they are passed directly through the cache without being stored. This reduces misses to the instructions that would have been replaced.The effectiveness of dynamic exclusion is dependent on the severity of cache conflicts and thus on the particular program and cache size of interest. However, across the SPEC benchmarks, simulation results show an average reduction in miss rate of 33% for a 32KB instruction cache with 16B lines. In addition, applying dynamic exclusion to one level of a cache hierarchy can improve the performance of the next level since instructions do not need to be stored on both levels. Finally, dynamic exclusion also improves combined instruction and data cache miss rates.