Global-aware and multi-order context-based prefetching for high-performance processors

Authors:
Yong Chen;Huaiyu Zhu;Philip C. Roth;Hui Jin;Xian-He Sun
Affiliations:
Department of Computer Science, Texas Tech University, USA;Department of Computer Science, University of Illinois at Urbana Champaign, USA;Computer Science and Mathematics Division, Oak Ridge National Laboratory, USA;Department of Computer Science, Illinois Institute of Technology, USA;Department of Computer Science, Illinois Institute of Technology, USA
Venue:
International Journal of High Performance Computing Applications
Year:
2011

Citing 36
Cited 0

Evaluating stream buffers as a secondary cache replacement

ISCA '94 Proceedings of the 21st annual international symposium on Computer architecture
Hitting the memory wall: implications of the obvious

ACM SIGARCH Computer Architecture News
Prefetching using Markov predictors

Proceedings of the 24th annual international symposium on Computer architecture
The predictability of data values

MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Improving direct-mapped cache performance by the addition of a small fully-associative cache and prefetch buffers

ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture
Dead-block prediction & dead-block correlating prefetchers

ISCA '01 Proceedings of the 28th annual international symposium on Computer architecture
Going the distance for TLB prefetching: an application-driven study

ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
Timekeeping in the memory system: predicting and optimizing memory behavior

ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
When Caches Aren't Enough: Data Prefetching Techniques

Computer
Effective Hardware-Based Data Prefetching for High-Performance Processors

IEEE Transactions on Computers
Sequential Hardware Prefetching in Shared-Memory Multiprocessors

IEEE Transactions on Parallel and Distributed Systems
Differential FCM: Increasing Value Prediction Accuracy by Improving Table Usage Efficiency

HPCA '01 Proceedings of the 7th International Symposium on High-Performance Computer Architecture
Call graph prefetching for database applications

ACM Transactions on Computer Systems (TOCS)
A Prefetch Taxonomy

IEEE Transactions on Computers
Reflections on the memory wall

Proceedings of the 1st conference on Computing frontiers
AC/DC: An Adaptive Data Cache Prefetcher

Proceedings of the 13th International Conference on Parallel Architectures and Compilation Techniques
Data Cache Prefetching Using a Global History Buffer

IEEE Micro
Exploring the limits of prefetching

IBM Journal of Research and Development - Electrochemical technology in microelectronics
Pin: building customized program analysis tools with dynamic instrumentation

Proceedings of the 2005 ACM SIGPLAN conference on Programming language design and implementation
Temporal Streaming of Shared Memory

Proceedings of the 32nd annual international symposium on Computer Architecture
Data Cache Prefetching Using a Global History Buffer

HPCA '04 Proceedings of the 10th International Symposium on High Performance Computer Architecture
Bulk Disambiguation of Speculative Threads in Multiprocessors

Proceedings of the 33rd annual international symposium on Computer Architecture
CAVA: Using checkpoint-assisted value prediction to hide L2 misses

ACM Transactions on Architecture and Code Optimization (TACO)
Computer Architecture, Fourth Edition: A Quantitative Approach

Computer Architecture, Fourth Edition: A Quantitative Approach
SPEC CPU2006 benchmark tools

ACM SIGARCH Computer Architecture News
Fixed and Adaptive Sequential Prefetching in Shared Memory Multiprocessors

ICPP '93 Proceedings of the 1993 International Conference on Parallel Processing - Volume 01
Feedback Directed Prefetching: Improving the Performance and Bandwidth-Efficiency of Hardware Prefetchers

HPCA '07 Proceedings of the 2007 IEEE 13th International Symposium on High Performance Computer Architecture
Data prefetching in a cache hierarchy with high bandwidth and capacity

ACM SIGARCH Computer Architecture News
Low-Cost Epoch-Based Correlation Prefetching for Commercial Applications

Proceedings of the 40th Annual IEEE/ACM International Symposium on Microarchitecture
Data access history cache and associated data prefetching mechanisms

Proceedings of the 2007 ACM/IEEE conference on Supercomputing
Flexible Hardware Acceleration for Instruction-Grain Program Monitoring

ISCA '08 Proceedings of the 35th Annual International Symposium on Computer Architecture
Spatio-temporal memory streaming

Proceedings of the 36th annual international symposium on Computer architecture
Stream chaining: exploiting multiple levels of correlation in data prefetching

Proceedings of the 36th annual international symposium on Computer architecture
Coordinated control of multiple prefetchers in multi-core systems

Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture
Inter-core cooperative TLB for chip multiprocessors

Proceedings of the fifteenth edition of ASPLOS on Architectural support for programming languages and operating systems
Improving the Effectiveness of Context-Based Prefetching with Multi-order Analysis

ICPPW '10 Proceedings of the 2010 39th International Conference on Parallel Processing Workshops

Quantified Score

Hi-index	0.00

Visualization

Abstract

Data prefetching is widely used in high-end computing systems to accelerate data accesses and to bridge the increasing performance gap between processor and memory. Context-based prefetching has become a primary focus of study in recent years due to its general applicability. However, current context-based prefetchers only adopt the context analysis of a single order, which suffers from low prefetching coverage and thus limits the overall prefetching effectiveness. Also, existing approaches usually consider the context of the address stream from a single instruction but not the context of the address stream from all instructions, which further limits the context-based prefetching effectiveness. In this study, we propose a new context-based prefetcher called the Global-aware and Multi-order Context-based (GMC) prefetcher. The GMC prefetcher uses multi-order, local and global context analysis to increase prefetching coverage while maintaining prefetching accuracy. In extensive simulation testing of the SPEC-CPU2006 benchmarks with an enhanced CMP$im simulator, the proposed GMC prefetcher was shown to outperform existing prefetchers and to reduce the data-access latency effectively. The average Instructions Per Cycle (IPC) improvement of SPEC CINT2006 and CFP2006 benchmarks with GMC prefetching was over 55% and 44% respectively.