Run-time spatial locality detection and optimization

Authors:
Teresa L. Johnson;Matthew C. Merten;Wen-Mei W. Hwu
Affiliations:
Center for Reliable and High-Performance Computing, University of Illinois, Urbana-Champaign, IL;Center for Reliable and High-Performance Computing, University of Illinois, Urbana-Champaign, IL;Center for Reliable and High-Performance Computing, University of Illinois, Urbana-Champaign, IL
Venue:
MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Year:
1997

Citing 17
Cited 29

Line (block) size choice for CPU cache memories

IEEE Transactions on Computers
Data prefetching in multiprocessor vector cache memories

ISCA '91 Proceedings of the 18th annual international symposium on Computer architecture
IMPACT: an architectural framework for multiple-instruction-issue processors

ISCA '91 Proceedings of the 18th annual international symposium on Computer architecture
Data access microarchitectures for superscalar processors with compiler-assisted data prefetching

MICRO 24 Proceedings of the 24th annual international symposium on Microarchitecture
An effective on-chip preloading scheme to reduce data access penalty

Proceedings of the 1991 ACM/IEEE conference on Supercomputing
Design and evaluation of a compiler algorithm for prefetching

ASPLOS V Proceedings of the fifth international conference on Architectural support for programming languages and operating systems
Stride directed prefetching in scalar processors

MICRO 25 Proceedings of the 25th annual international symposium on Microarchitecture
A data cache with multiple caching strategies tuned to different types of locality

ICS '95 Proceedings of the 9th international conference on Supercomputing
A modified approach to data cache management

Proceedings of the 28th annual international symposium on Microarchitecture
SPAID: software prefetching in pointer- and call-intensive environments

Proceedings of the 28th annual international symposium on Microarchitecture
Compiler-based prefetching for recursive data structures

Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
Run-time adaptive cache hierarchy management via reference analysis

Proceedings of the 24th annual international symposium on Computer architecture
The performance impact of block sizes and fetch strategies

ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture
Improving direct-mapped cache performance by the addition of a small fully-associative cache and prefetch buffers

ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture
Cache Memories

ACM Computing Surveys (CSUR)
Predicting and Precluding Problems with Memory Latency

IEEE Micro
Software methods for improvement of cache performance on supercomputer applications

Software methods for improvement of cache performance on supercomputer applications

Exploiting spatial locality in data caches using spatial footprints

Proceedings of the 25th annual international symposium on Computer architecture
Guest Editors' Introduction-Cache Memory and Related Problems: Enhancing and Exploiting the Locality

IEEE Transactions on Computers - Special issue on cache memory and related problems
Effects of Multithreading on Cache Performance

IEEE Transactions on Computers - Special issue on cache memory and related problems
Reducing cache misses using hardware and software page placement

ICS '99 Proceedings of the 13th international conference on Supercomputing
Locality vs. criticality

ISCA '01 Proceedings of the 28th annual international symposium on Computer architecture
Partitioned instruction cache architecture for energy efficiency

ACM Transactions on Embedded Computing Systems (TECS)
Using the Compiler to Improve Cache Replacement Decisions

Proceedings of the 2002 International Conference on Parallel Architectures and Compilation Techniques
A Power Efficient Cache Structure for Embedded Processors Based on the Dual Cache Structure

LCTES '00 Proceedings of the ACM SIGPLAN Workshop on Languages, Compilers, and Tools for Embedded Systems
Improved indexing for cache miss reduction in embedded systems

Proceedings of the 40th annual Design Automation Conference
Tiling, Block Data Layout, and Memory Hierarchy Performance

IEEE Transactions on Parallel and Distributed Systems
A new hybrid approach to exploit localities: LRFU with adaptive prefetching

ACM SIGMETRICS Performance Evaluation Review
Dynamic techniques to reduce memory traffic in embedded systems

Proceedings of the 1st conference on Computing frontiers
An Integrated Approach for Improving Cache Behavior

DATE '03 Proceedings of the conference on Design, Automation and Test in Europe - Volume 1
Tag Overflow Buffering: An Energy-Efficient Cache Architecture

Proceedings of the conference on Design, Automation and Test in Europe - Volume 1
Improving data cache performance with integrated use of split caches, victim cache and stream buffers

MEDEA '04 Proceedings of the 2004 workshop on MEmory performance: DEaling with Applications , systems and architecture
Quantifying Locality In The Memory Access Patterns of HPC Applications

SC '05 Proceedings of the 2005 ACM/IEEE conference on Supercomputing
Simple penalty-sensitive replacement policies for caches

Proceedings of the 3rd conference on Computing frontiers
Spatial Memory Streaming

Proceedings of the 33rd annual international symposium on Computer Architecture
Programmable bus/memory controllers in modern computer architecture

Proceedings of the 43rd annual Southeast regional conference - Volume 1
Detailed cache simulation for detecting bottleneck, miss reason and optimization potentialities

valuetools '06 Proceedings of the 1st international conference on Performance evaluation methodolgies and tools
Memory Prefetching Using Adaptive Stream Detection

Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
Characteristics of workloads used in high performance and technical computing

Proceedings of the 21st annual international conference on Supercomputing
Increasing cache capacity through word filtering

Proceedings of the 21st annual international conference on Supercomputing
Revisiting Cache Block Superloading

HiPEAC '09 Proceedings of the 4th International Conference on High Performance Embedded Architectures and Compilers
Code and Data Placement for Embedded Processors with Scratchpad and Cache Memories

Journal of Signal Processing Systems
Enabling adaptive live streaming in P2P multipath networks

The Journal of Supercomputing
Tag overflow buffering: reducing total memory energy by reduced-tag matching

IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Orchestrated scheduling and prefetching for GPGPUs

Proceedings of the 40th Annual International Symposium on Computer Architecture
Linearizing irregular memory accesses for improved correlated prefetching

Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture

Quantified Score

Hi-index	0.00

Visualization

Abstract

As the disparity between processor and main memory performance grows, the number of execution cycles spent waiting for memory accesses to complete also increases. As a result, latency hiding techniques are critical for improved application performance on future processors. We present a microarchitecture scheme which detects and adapts to varying spatial locality, dynamically adjusting the amount of data fetched on a cache miss. The Spatial Locality Detection Table, introduced in this paper, facilitates the detection of spatial locality across adjacent cached blocks. Results from detailed simulations of several integer programs show significant speedups. The improvements are due to the reduction of conflict and capacity misses by utilizing small blocks and small fetch sizes when spatial locality is absent, and the prefetching effect of large fetch sizes when spatial locality exists.