Coterminous locality and coterminous group data prefetching on chip-multiprocessors

Authors:
Xudong Shi;Zhen Yang;Jih-Kwon Peir;Lu Peng;Yen-Kuang Chen;Victor Lee;Bob Liang
Affiliations:
Computer & Information Science & Engineering, University of Florida, Gainesville, FL;Computer & Information Science & Engineering, University of Florida, Gainesville, FL;Computer & Information Science & Engineering, University of Florida, Gainesville, FL;Electrical & Computer Engineering, Louisiana State University, Baton Rouge, LA;Architecture Research Lab, Intel Corporation, Santa Clara, CA;Architecture Research Lab, Intel Corporation, Santa Clara, CA;Architecture Research Lab, Intel Corporation, Santa Clara, CA
Venue:
IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
Year:
2006

Citing 27
Cited 3

Reducing memory latency via non-blocking and prefetching caches

ASPLOS V Proceedings of the fifth international conference on Architectural support for programming languages and operating systems
The case for a single-chip multiprocessor

Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
Correlation-based hardware prefetching

Correlation-based hardware prefetching
Prefetching using Markov predictors

Proceedings of the 24th annual international symposium on Computer architecture
Improving direct-mapped cache performance by the addition of a small fully-associative cache and prefetch buffers

ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture
Recency-based TLB preloading

Proceedings of the 27th annual international symposium on Computer architecture
Piranha: a scalable architecture based on single-chip multiprocessing

Proceedings of the 27th annual international symposium on Computer architecture
Data prefetch mechanisms

ACM Computing Surveys (CSUR)
Dead-block prediction & dead-block correlating prefetchers

ISCA '01 Proceedings of the 28th annual international symposium on Computer architecture
Dynamic hot data stream prefetching for general-purpose programs

PLDI '02 Proceedings of the ACM SIGPLAN 2002 Conference on Programming language design and implementation
Using a user-level memory thread for correlation prefetching

ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
Automatically characterizing large scale program behavior

Proceedings of the 10th international conference on Architectural support for programming languages and operating systems
Simics: A Full System Simulation Platform

Computer
On the Stability of Temporal Data Reference Profiles

Proceedings of the 2001 International Conference on Parallel Architectures and Compilation Techniques
TCP: Tag Correlating Prefetchers

HPCA '03 Proceedings of the 9th International Symposium on High-Performance Computer Architecture
Guided region prefetching: a cooperative hardware/software approach

Proceedings of the 30th annual international symposium on Computer architecture
Effective stream-based and execution-based data prefetching

Proceedings of the 18th annual international conference on Supercomputing
Microarchitecture and Design Challenges for Gigascale Integration

Proceedings of the 37th annual IEEE/ACM International Symposium on Microarchitecture
Single-Chip Multiprocessors: The Next Wave of Computer Architecture Innovation

Proceedings of the 37th annual IEEE/ACM International Symposium on Microarchitecture
Conjoined-Core Chip Multiprocessing

Proceedings of the 37th annual IEEE/ACM International Symposium on Microarchitecture
A Performance Comparison of DRAM Memory System Optimizations for SMT Processors

HPCA '05 Proceedings of the 11th International Symposium on High-Performance Computer Architecture
Effective Instruction Prefetching in Chip Multiprocessors for Modern Commercial Applications

HPCA '05 Proceedings of the 11th International Symposium on High-Performance Computer Architecture
Chip Multithreading: Opportunities and Challenges

HPCA '05 Proceedings of the 11th International Symposium on High-Performance Computer Architecture
Temporal Streaming of Shared Memory

Proceedings of the 32nd annual international symposium on Computer Architecture
Data Cache Prefetching Using a Global History Buffer

HPCA '04 Proceedings of the 10th International Symposium on High Performance Computer Architecture
EIC's Message: Chip-level microarchitecture trends

IEEE Micro
IBM Power5 Chip: A Dual-Core Multithreaded Processor

IEEE Micro

Adaptive set pinning: managing shared caches in chip multiprocessors

Proceedings of the 13th international conference on Architectural support for programming languages and operating systems
A compiler-directed data prefetching scheme for chip multiprocessors

Proceedings of the 14th ACM SIGPLAN symposium on Principles and practice of parallel programming
Adaptive prefetching for shared cache based chip multiprocessors

Proceedings of the Conference on Design, Automation and Test in Europe

Quantified Score

Hi-index	0.00

Visualization

Abstract

Due to shared cache contentions and interconnect delays, data prefetching is more critical in alleviating penalties from increasing memory latencies and demands on Chip-Multiprocessors (CMPs). Through deep analysis of SPEC2000 applications, we find that a part of the nearby data memory references often exhibit highlyrepeated patterns with long, but equal block reuse distance. These references can form a coterminous group (CG). Coterminous locality is introduced as that when a member in a CG is referenced, the remaining members will likely be referenced in the near future. Based on the coterminous locality behavior, we implement a novel CG data prefetcher on CMPs. Performance evaluations show that the proposed prefetcher can accurately cover up to 40-50% of the total misses, and result in 50-60% of potential performance improvement for several selected workload mixes.