Run-time adaptive cache hierarchy management via reference analysis

Authors:
Teresa L. Johnson;Wen-mei W. Hwu
Affiliations:
Center for Reliable and High-Performance Computing, University of Illinois, Urbana-Champaign, IL;Center for Reliable and High-Performance Computing, University of Illinois, Urbana-Champaign, IL
Venue:
Proceedings of the 24th annual international symposium on Computer architecture
Year:
1997

Citing 13
Cited 53

High-bandwidth data memory systems for superscalar processors

ASPLOS IV Proceedings of the fourth international conference on Architectural support for programming languages and operating systems
An architecture for software-controlled data prefetching

ISCA '91 Proceedings of the 18th annual international symposium on Computer architecture
Data prefetching in multiprocessor vector cache memories

ISCA '91 Proceedings of the 18th annual international symposium on Computer architecture
IMPACT: an architectural framework for multiple-instruction-issue processors

ISCA '91 Proceedings of the 18th annual international symposium on Computer architecture
Data access microarchitectures for superscalar processors with compiler-assisted data prefetching

MICRO 24 Proceedings of the 24th annual international symposium on Microarchitecture
An effective on-chip preloading scheme to reduce data access penalty

Proceedings of the 1991 ACM/IEEE conference on Supercomputing
Tolerating data access latency with register preloading

ICS '92 Proceedings of the 6th international conference on Supercomputing
Design and evaluation of a compiler algorithm for prefetching

ASPLOS V Proceedings of the fifth international conference on Architectural support for programming languages and operating systems
A modified approach to data cache management

Proceedings of the 28th annual international symposium on Microarchitecture
Improving direct-mapped cache performance by the addition of a small fully-associative cache and prefetch buffers

ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture
Cache Memories

ACM Computing Surveys (CSUR)
Predicting and Precluding Problems with Memory Latency

IEEE Micro
Software methods for improvement of cache performance on supercomputer applications

Software methods for improvement of cache performance on supercomputer applications

Run-time spatial locality detection and optimization

MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Streamlining inter-operation memory communication via data dependence prediction

MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Utilizing reuse information in data cache management

ICS '98 Proceedings of the 12th international conference on Supercomputing
Using prediction to accelerate coherence protocols

Proceedings of the 25th annual international symposium on Computer architecture
Exploiting spatial locality in data caches using spatial footprints

Proceedings of the 25th annual international symposium on Computer architecture
Load latency tolerance in dynamically scheduled processors

MICRO 31 Proceedings of the 31st annual ACM/IEEE international symposium on Microarchitecture
Capturing dynamic memory reference behavior with adaptive cache topology

Proceedings of the eighth international conference on Architectural support for programming languages and operating systems
A look at several memory management units, TLB-refill mechanisms, and page table organizations

Proceedings of the eighth international conference on Architectural support for programming languages and operating systems
Power and performance tradeoffs using various caching strategies

ISLPED '98 Proceedings of the 1998 international symposium on Low power electronics and design
Functional Implementation Techniques for CPU Cache Memories

IEEE Transactions on Computers - Special issue on cache memory and related problems
A locality sensitive multi-module cache with explicit management

ICS '99 Proceedings of the 13th international conference on Supercomputing
Using dynamic cache management techniques to reduce energy in a high-performance processor

ISLPED '99 Proceedings of the 1999 international symposium on Low power electronics and design
Hardware identification of cache conflict misses

Proceedings of the 32nd annual ACM/IEEE international symposium on Microarchitecture
Active Management of Data Caches by Exploiting Reuse Information

IEEE Transactions on Computers
Branch Prediction, Instruction-Window Size, and Cache Size: Performance Trade-Offs and Simulation Techniques

IEEE Transactions on Computers
Run-Time Cache Bypassing

IEEE Transactions on Computers
Locality vs. criticality

ISCA '01 Proceedings of the 28th annual international symposium on Computer architecture
Power-aware partitioned cache architectures

ISLPED '01 Proceedings of the 2001 international symposium on Low power electronics and design
Runtime identification of cache conflict misses: The adaptive miss buffer

ACM Transactions on Computer Systems (TOCS)
Designing a Modern Memory Hierarchy with Hardware Prefetching

IEEE Transactions on Computers
Increasing power efficiency of multi-core network processors through data filtering

CASES '02 Proceedings of the 2002 international conference on Compilers, architecture, and synthesis for embedded systems
An adaptive, non-uniform cache structure for wire-delay dominated on-chip caches

Proceedings of the 10th international conference on Architectural support for programming languages and operating systems
Splitting the Data Cache: A Survey

IEEE Concurrency
Random-Access Data Storage Components in Customized Architectures

IEEE Design & Test
Partitioned instruction cache architecture for energy efficiency

ACM Transactions on Embedded Computing Systems (TECS)
Compiler-Directed Cache Assist Adaptivity

ISHPC '00 Proceedings of the Third International Symposium on High Performance Computing
The Filter Data Cache: A Tour Management Comparison with Related Split Data Cache Schemes Sensitive to Data Localities

ISHPC '00 Proceedings of the Third International Symposium on High Performance Computing
A Power Efficient Cache Structure for Embedded Processors Based on the Dual Cache Structure

LCTES '00 Proceedings of the ACM SIGPLAN Workshop on Languages, Compilers, and Tools for Embedded Systems
Compiler-Directed Cache Line Size Adaptivity

IMS '00 Revised Papers from the Second International Workshop on Intelligent Memory Systems
Improving cache hit ratio by extended referencing cache lines

Journal of Computing Sciences in Colleges
Highly accurate and efficient evaluation of randomising set index functions

Journal of Systems Architecture: the EUROMICRO Journal
Optimal Replacement Is NP-Hardfor Nonstandard Caches

IEEE Transactions on Computers
Distance Associativity for High-Performance Energy-Efficient Non-Uniform Cache Architectures

Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture
Self-correcting LRU replacement policies

Proceedings of the 1st conference on Computing frontiers
Reducing traffic generated by conflict misses in caches

Proceedings of the 1st conference on Computing frontiers
An Integrated Approach for Improving Cache Behavior

DATE '03 Proceedings of the conference on Design, Automation and Test in Europe - Volume 1
A sample-based cache mapping scheme

LCTES '05 Proceedings of the 2005 ACM SIGPLAN/SIGBED conference on Languages, compilers, and tools for embedded systems
Compilation techniques for energy reduction in horizontally partitioned cache architectures

Proceedings of the 2005 international conference on Compilers, architectures and synthesis for embedded systems
Improving data cache performance with integrated use of split caches, victim cache and stream buffers

MEDEA '04 Proceedings of the 2004 workshop on MEmory performance: DEaling with Applications , systems and architecture
Exploring the performance of split data cache schemes on superscalar processors and symmetric multiprocessors

Journal of Systems Architecture: the EUROMICRO Journal
Using the first-level caches as filters to reduce the pollution caused by speculative memory references

International Journal of Parallel Programming
Victim management in a cache hierarchy

IBM Journal of Research and Development - Advanced silicon technology
Page mapping for heterogeneously partitioned caches: Complexity and heuristics

Journal of Embedded Computing - Cache exploitation in embedded systems
Improving SDRAM access energy efficiency for low-power embedded systems

ACM Transactions on Embedded Computing Systems (TECS)
A compiler-in-the-loop framework to explore horizontally partitioned cache architectures

Proceedings of the 2008 Asia and South Pacific Design Automation Conference
Capturing and optimizing the interactions between prefetching and cache line turnoff

Microprocessors & Microsystems
Less reused filter: improving l2 cache performance via filtering less reused lines

Proceedings of the 23rd international conference on Supercomputing
SieveStore: a highly-selective, ensemble-level disk cache for cost-performance

Proceedings of the 37th annual international symposium on Computer architecture
Two management approaches of the split data cache in multiprocessor systems

EURO-PDP'00 Proceedings of the 8th Euromicro conference on Parallel and distributed processing
SSD bufferpool extensions for database systems

Proceedings of the VLDB Endowment
Reducing off-chip memory traffic by selective cache management scheme in GPGPUs

Proceedings of the 5th Annual Workshop on General Purpose Processing with Graphics Processing Units
FLEXclusion: balancing cache capacity and on-chip bandwidth via flexible exclusion

Proceedings of the 39th Annual International Symposium on Computer Architecture
The locality-aware adaptive cache coherence protocol

Proceedings of the 40th Annual International Symposium on Computer Architecture

Quantified Score

Hi-index	0.01

Visualization

Abstract

Improvements in main memory speeds have not kept pace with increasing processor clock frequency and improved exploitation of instruction-level parallelism. Consequently, the gap between processor and main memory performance is expected to grow, increasing the number of execution cycles spent waiting for memory accesses to complete. One solution to this growing problem is to reduce the number of cache misses by increasing the effectiveness of the cache hierarchy. In this paper we present a technique for dynamic analysis of program data access behavior, which is then used to proactively guide the placement of data within the cache hierarchy in a location-sensitive manner. We introduce the concept of a macroblock, which allows us to feasibly characterize the memory locations accessed by a program, and a Memory Address Table, which performs the dynamic reference analysis. Our technique is fully compatible with existing Instruction Set Architectures. Results from detailed simulations of several integer programs show significant speedups.