Line (block) size choice for CPU cache memories
IEEE Transactions on Computers
Data prefetching in multiprocessor vector cache memories
ISCA '91 Proceedings of the 18th annual international symposium on Computer architecture
IMPACT: an architectural framework for multiple-instruction-issue processors
ISCA '91 Proceedings of the 18th annual international symposium on Computer architecture
Data access microarchitectures for superscalar processors with compiler-assisted data prefetching
MICRO 24 Proceedings of the 24th annual international symposium on Microarchitecture
An effective on-chip preloading scheme to reduce data access penalty
Proceedings of the 1991 ACM/IEEE conference on Supercomputing
Design and evaluation of a compiler algorithm for prefetching
ASPLOS V Proceedings of the fifth international conference on Architectural support for programming languages and operating systems
Stride directed prefetching in scalar processors
MICRO 25 Proceedings of the 25th annual international symposium on Microarchitecture
A data cache with multiple caching strategies tuned to different types of locality
ICS '95 Proceedings of the 9th international conference on Supercomputing
A modified approach to data cache management
Proceedings of the 28th annual international symposium on Microarchitecture
SPAID: software prefetching in pointer- and call-intensive environments
Proceedings of the 28th annual international symposium on Microarchitecture
Compiler-based prefetching for recursive data structures
Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
Run-time adaptive cache hierarchy management via reference analysis
Proceedings of the 24th annual international symposium on Computer architecture
The performance impact of block sizes and fetch strategies
ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture
ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture
ACM Computing Surveys (CSUR)
Software methods for improvement of cache performance on supercomputer applications
Software methods for improvement of cache performance on supercomputer applications
Exploiting spatial locality in data caches using spatial footprints
Proceedings of the 25th annual international symposium on Computer architecture
Guest Editors' Introduction-Cache Memory and Related Problems: Enhancing and Exploiting the Locality
IEEE Transactions on Computers - Special issue on cache memory and related problems
Effects of Multithreading on Cache Performance
IEEE Transactions on Computers - Special issue on cache memory and related problems
Reducing cache misses using hardware and software page placement
ICS '99 Proceedings of the 13th international conference on Supercomputing
ISCA '01 Proceedings of the 28th annual international symposium on Computer architecture
Partitioned instruction cache architecture for energy efficiency
ACM Transactions on Embedded Computing Systems (TECS)
Using the Compiler to Improve Cache Replacement Decisions
Proceedings of the 2002 International Conference on Parallel Architectures and Compilation Techniques
A Power Efficient Cache Structure for Embedded Processors Based on the Dual Cache Structure
LCTES '00 Proceedings of the ACM SIGPLAN Workshop on Languages, Compilers, and Tools for Embedded Systems
Improved indexing for cache miss reduction in embedded systems
Proceedings of the 40th annual Design Automation Conference
Tiling, Block Data Layout, and Memory Hierarchy Performance
IEEE Transactions on Parallel and Distributed Systems
A new hybrid approach to exploit localities: LRFU with adaptive prefetching
ACM SIGMETRICS Performance Evaluation Review
Dynamic techniques to reduce memory traffic in embedded systems
Proceedings of the 1st conference on Computing frontiers
An Integrated Approach for Improving Cache Behavior
DATE '03 Proceedings of the conference on Design, Automation and Test in Europe - Volume 1
Tag Overflow Buffering: An Energy-Efficient Cache Architecture
Proceedings of the conference on Design, Automation and Test in Europe - Volume 1
MEDEA '04 Proceedings of the 2004 workshop on MEmory performance: DEaling with Applications , systems and architecture
Quantifying Locality In The Memory Access Patterns of HPC Applications
SC '05 Proceedings of the 2005 ACM/IEEE conference on Supercomputing
Simple penalty-sensitive replacement policies for caches
Proceedings of the 3rd conference on Computing frontiers
Proceedings of the 33rd annual international symposium on Computer Architecture
Programmable bus/memory controllers in modern computer architecture
Proceedings of the 43rd annual Southeast regional conference - Volume 1
Detailed cache simulation for detecting bottleneck, miss reason and optimization potentialities
valuetools '06 Proceedings of the 1st international conference on Performance evaluation methodolgies and tools
Memory Prefetching Using Adaptive Stream Detection
Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
Characteristics of workloads used in high performance and technical computing
Proceedings of the 21st annual international conference on Supercomputing
Increasing cache capacity through word filtering
Proceedings of the 21st annual international conference on Supercomputing
Revisiting Cache Block Superloading
HiPEAC '09 Proceedings of the 4th International Conference on High Performance Embedded Architectures and Compilers
Code and Data Placement for Embedded Processors with Scratchpad and Cache Memories
Journal of Signal Processing Systems
Enabling adaptive live streaming in P2P multipath networks
The Journal of Supercomputing
Tag overflow buffering: reducing total memory energy by reduced-tag matching
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Orchestrated scheduling and prefetching for GPGPUs
Proceedings of the 40th Annual International Symposium on Computer Architecture
Linearizing irregular memory accesses for improved correlated prefetching
Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture
Hi-index | 0.00 |
As the disparity between processor and main memory performance grows, the number of execution cycles spent waiting for memory accesses to complete also increases. As a result, latency hiding techniques are critical for improved application performance on future processors. We present a microarchitecture scheme which detects and adapts to varying spatial locality, dynamically adjusting the amount of data fetched on a cache miss. The Spatial Locality Detection Table, introduced in this paper, facilitates the detection of spatial locality across adjacent cached blocks. Results from detailed simulations of several integer programs show significant speedups. The improvements are due to the reduction of conflict and capacity misses by utilizing small blocks and small fetch sizes when spatial locality is absent, and the prefetching effect of large fetch sizes when spatial locality exists.