High-bandwidth data memory systems for superscalar processors
ASPLOS IV Proceedings of the fourth international conference on Architectural support for programming languages and operating systems
An architecture for software-controlled data prefetching
ISCA '91 Proceedings of the 18th annual international symposium on Computer architecture
IMPACT: an architectural framework for multiple-instruction-issue processors
ISCA '91 Proceedings of the 18th annual international symposium on Computer architecture
Data access microarchitectures for superscalar processors with compiler-assisted data prefetching
MICRO 24 Proceedings of the 24th annual international symposium on Microarchitecture
An effective on-chip preloading scheme to reduce data access penalty
Proceedings of the 1991 ACM/IEEE conference on Supercomputing
Tolerating data access latency with register preloading
ICS '92 Proceedings of the 6th international conference on Supercomputing
Design and evaluation of a compiler algorithm for prefetching
ASPLOS V Proceedings of the fifth international conference on Architectural support for programming languages and operating systems
Stride directed prefetching in scalar processors
MICRO 25 Proceedings of the 25th annual international symposium on Microarchitecture
Program analysis and optimization for machines with instruction cache
Program analysis and optimization for machines with instruction cache
Efficient simulation of caches under optimal replacement with applications to miss characterization
SIGMETRICS '93 Proceedings of the 1993 ACM SIGMETRICS conference on Measurement and modeling of computer systems
A data cache with multiple caching strategies tuned to different types of locality
ICS '95 Proceedings of the 9th international conference on Supercomputing
A modified approach to data cache management
Proceedings of the 28th annual international symposium on Microarchitecture
SPAID: software prefetching in pointer- and call-intensive environments
Proceedings of the 28th annual international symposium on Microarchitecture
Compiler-based prefetching for recursive data structures
Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
Data prefetching on the HP PA-8000
Proceedings of the 24th annual international symposium on Computer architecture
Run-time adaptive cache hierarchy management via reference analysis
Proceedings of the 24th annual international symposium on Computer architecture
Utilizing reuse information in data cache management
ICS '98 Proceedings of the 12th international conference on Supercomputing
Load latency tolerance in dynamically scheduled processors
MICRO 31 Proceedings of the 31st annual ACM/IEEE international symposium on Microarchitecture
Dependence based prefetching for linked data structures
Proceedings of the eighth international conference on Architectural support for programming languages and operating systems
Effective jump-pointer prefetching for linked data structures
ISCA '99 Proceedings of the 26th annual international symposium on Computer architecture
ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture
ACM Computing Surveys (CSUR)
Lockup-free instruction fetch/prefetch cache organization
ISCA '81 Proceedings of the 8th annual symposium on Computer Architecture
Software methods for improvement of cache performance on supercomputer applications
Software methods for improvement of cache performance on supercomputer applications
Run-time adaptive cache management
Run-time adaptive cache management
Cache decay: exploiting generational behavior to reduce cache leakage power
ISCA '01 Proceedings of the 28th annual international symposium on Computer architecture
Morphable Cache Architectures: Potential Benefits
OM '01 Proceedings of the 2001 ACM SIGPLAN workshop on Optimization of middleware and distributed systems
MIST: an algorithm for memory miss traffic management
Proceedings of the 2000 IEEE/ACM international conference on Computer-aided design
ISHPC '00 Proceedings of the Third International Symposium on High Performance Computing
Compiler managed micro-cache bypassing for high performance EPIC processors
Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture
Self-correcting LRU replacement policies
Proceedings of the 1st conference on Computing frontiers
A sample-based cache mapping scheme
LCTES '05 Proceedings of the 2005 ACM SIGPLAN/SIGBED conference on Languages, compilers, and tools for embedded systems
On the performance of trace locality of reference
Performance Evaluation - Performance modelling and evaluation of high-performance parallel and distributed systems
Discretionary Caching for I/O on Clusters
Cluster Computing
Optimizing embedded applications using programmer-inserted hints
Proceedings of the 2005 Asia and South Pacific Design Automation Conference
MEDEA '06 Proceedings of the 2006 workshop on MEmory performance: DEaling with Applications, systems and architectures
Page mapping for heterogeneously partitioned caches: Complexity and heuristics
Journal of Embedded Computing - Cache exploitation in embedded systems
ACM SIGARCH Computer Architecture News
Cache bursts: A new approach for eliminating dead blocks and increasing cache efficiency
Proceedings of the 41st annual IEEE/ACM International Symposium on Microarchitecture
Proceedings of the 41st annual IEEE/ACM International Symposium on Microarchitecture
Less reused filter: improving l2 cache performance via filtering less reused lines
Proceedings of the 23rd international conference on Supercomputing
Divide-and-conquer: a bubble replacement for low level caches
Proceedings of the 23rd international conference on Supercomputing
Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture
Global management of cache hierarchies
Proceedings of the 7th ACM international conference on Computing frontiers
Where replacement algorithms fail: a thorough analysis
Proceedings of the 7th ACM international conference on Computing frontiers
Instruction-based reuse-distance prediction for effective cache management
SAMOS'09 Proceedings of the 9th international conference on Systems, architectures, modeling and simulation
Enhancing last-level cache performance by block bypassing and early miss determination
ACSAC'06 Proceedings of the 11th Asia-Pacific conference on Advances in Computer Systems Architecture
Code-based cache partitioning for improving hardware cache performance
Proceedings of the 6th International Conference on Ubiquitous Information Management and Communication
Combining recency of information with selective random and a victim cache in last-level caches
ACM Transactions on Architecture and Code Optimization (TACO)
Optimal bypass monitor for high performance last-level caches
Proceedings of the 21st international conference on Parallel architectures and compilation techniques
The evicted-address filter: a unified mechanism to address both cache pollution and thrashing
Proceedings of the 21st international conference on Parallel architectures and compilation techniques
Exploiting single-usage for effective memory management
ACSAC'07 Proceedings of the 12th Asia-Pacific conference on Advances in Computer Systems Architecture
Improving Cache Management Policies Using Dynamic Reuse Distances
MICRO-45 Proceedings of the 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture
MALEC: a multiple access low energy cache
Proceedings of the Conference on Design, Automation and Test in Europe
Managing shared last-level cache in a heterogeneous multicore processor
PACT '13 Proceedings of the 22nd international conference on Parallel architectures and compilation techniques
Hi-index | 14.98 |
The growing disparity between processor and memory performance has made cache misses increasingly expensive. Additionally, data and instruction caches are not always used efficiently, resulting in large numbers of cache misses. Therefore, the importance of cache performance improvements at each level of the memory hierarchy will continue to grow. In numeric programs, there are several known compiler techniques for optimizing data cache performance. However, integer (nonnumeric) programs often have irregular access patterns that are more difficult for the compiler to optimize. In the past, cache management techniques such as cache bypassing were implemented manually at the machine-language-programming level. As the available chip area grows, it makes sense to spend more resources to allow intelligent control over the cache management. In this paper, we present an approach to improving cache effectiveness, taking advantage of the growing chip area, utilizing run-time adaptive cache management techniques, optimizing both performance and cost of implementation. Specifically, we are aiming to increase data cache effectiveness for integer programs. We propose a microarchitecture scheme where the hardware determines data placement within the cache hierarchy based on dynamic referencing behavior. This scheme is fully compatible with existing Instruction Set Architectures. This paper examines the theoretical upper bounds on the cache hit ratio that cache bypassing can provide for integer applications, including several Windows applications with OS activity. Then, detailed trace-driven simulations of the integer applications are used to show that the implementation described in this paper can achieve performance close to that of the upper bound.