Reducing memory latency via non-blocking and prefetching caches
ASPLOS V Proceedings of the fifth international conference on Architectural support for programming languages and operating systems
The case for a single-chip multiprocessor
Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
Correlation-based hardware prefetching
Correlation-based hardware prefetching
Prefetching using Markov predictors
Proceedings of the 24th annual international symposium on Computer architecture
ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture
Proceedings of the 27th annual international symposium on Computer architecture
Piranha: a scalable architecture based on single-chip multiprocessing
Proceedings of the 27th annual international symposium on Computer architecture
ACM Computing Surveys (CSUR)
Dead-block prediction & dead-block correlating prefetchers
ISCA '01 Proceedings of the 28th annual international symposium on Computer architecture
Dynamic hot data stream prefetching for general-purpose programs
PLDI '02 Proceedings of the ACM SIGPLAN 2002 Conference on Programming language design and implementation
Using a user-level memory thread for correlation prefetching
ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
Automatically characterizing large scale program behavior
Proceedings of the 10th international conference on Architectural support for programming languages and operating systems
On the Stability of Temporal Data Reference Profiles
Proceedings of the 2001 International Conference on Parallel Architectures and Compilation Techniques
TCP: Tag Correlating Prefetchers
HPCA '03 Proceedings of the 9th International Symposium on High-Performance Computer Architecture
Guided region prefetching: a cooperative hardware/software approach
Proceedings of the 30th annual international symposium on Computer architecture
Effective stream-based and execution-based data prefetching
Proceedings of the 18th annual international conference on Supercomputing
Microarchitecture and Design Challenges for Gigascale Integration
Proceedings of the 37th annual IEEE/ACM International Symposium on Microarchitecture
Single-Chip Multiprocessors: The Next Wave of Computer Architecture Innovation
Proceedings of the 37th annual IEEE/ACM International Symposium on Microarchitecture
Conjoined-Core Chip Multiprocessing
Proceedings of the 37th annual IEEE/ACM International Symposium on Microarchitecture
A Performance Comparison of DRAM Memory System Optimizations for SMT Processors
HPCA '05 Proceedings of the 11th International Symposium on High-Performance Computer Architecture
Effective Instruction Prefetching in Chip Multiprocessors for Modern Commercial Applications
HPCA '05 Proceedings of the 11th International Symposium on High-Performance Computer Architecture
Chip Multithreading: Opportunities and Challenges
HPCA '05 Proceedings of the 11th International Symposium on High-Performance Computer Architecture
Temporal Streaming of Shared Memory
Proceedings of the 32nd annual international symposium on Computer Architecture
Data Cache Prefetching Using a Global History Buffer
HPCA '04 Proceedings of the 10th International Symposium on High Performance Computer Architecture
Adaptive set pinning: managing shared caches in chip multiprocessors
Proceedings of the 13th international conference on Architectural support for programming languages and operating systems
A compiler-directed data prefetching scheme for chip multiprocessors
Proceedings of the 14th ACM SIGPLAN symposium on Principles and practice of parallel programming
Adaptive prefetching for shared cache based chip multiprocessors
Proceedings of the Conference on Design, Automation and Test in Europe
Hi-index | 0.00 |
Due to shared cache contentions and interconnect delays, data prefetching is more critical in alleviating penalties from increasing memory latencies and demands on Chip-Multiprocessors (CMPs). Through deep analysis of SPEC2000 applications, we find that a part of the nearby data memory references often exhibit highlyrepeated patterns with long, but equal block reuse distance. These references can form a coterminous group (CG). Coterminous locality is introduced as that when a member in a CG is referenced, the remaining members will likely be referenced in the near future. Based on the coterminous locality behavior, we implement a novel CG data prefetcher on CMPs. Performance evaluations show that the proposed prefetcher can accurately cover up to 40-50% of the total misses, and result in 50-60% of potential performance improvement for several selected workload mixes.