Algorithm-level Feedback-controlled Adaptive data prefetcher: Accelerating data access for high-performance processors

Authors:
Yong Chen;Huaiyu Zhu;Hui Jin;Xian-He Sun
Affiliations:
Department of Computer Science, Texas Tech University, United States;Department of Electrical and Computer Engineering, University of Illinois at Urbana-Champaign, United States;Department of Computer Science, Illinois Institute of Technology, United States;Department of Computer Science, Illinois Institute of Technology, United States
Venue:
Parallel Computing
Year:
2012

Citing 37
Cited 0

Hitting the memory wall: implications of the obvious

ACM SIGARCH Computer Architecture News
Trace cache: a low latency approach to high bandwidth instruction fetching

Proceedings of the 29th annual ACM/IEEE international symposium on Microarchitecture
Prefetching using Markov predictors

Proceedings of the 24th annual international symposium on Computer architecture
The predictability of data values

MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Improving direct-mapped cache performance by the addition of a small fully-associative cache and prefetch buffers

ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture
Space/time trade-offs in hash coding with allowable errors

Communications of the ACM
Dead-block prediction & dead-block correlating prefetchers

ISCA '01 Proceedings of the 28th annual international symposium on Computer architecture
Bloom filtering cache misses for accurate data speculation and prefetching

ICS '02 Proceedings of the 16th international conference on Supercomputing
Using a user-level memory thread for correlation prefetching

ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
Timekeeping in the memory system: predicting and optimizing memory behavior

ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
When Caches Aren't Enough: Data Prefetching Techniques

Computer
Effective Hardware-Based Data Prefetching for High-Performance Processors

IEEE Transactions on Computers
Sequential Hardware Prefetching in Shared-Memory Multiprocessors

IEEE Transactions on Parallel and Distributed Systems
Hybrid compiler/hardware prefetching for multiprocessors using low-overhead cache miss traps

ICPP '97 Proceedings of the international Conference on Parallel Processing
Differential FCM: Increasing Value Prediction Accuracy by Improving Table Usage Efficiency

HPCA '01 Proceedings of the 7th International Symposium on High-Performance Computer Architecture
Reflections on the memory wall

Proceedings of the 1st conference on Computing frontiers
Effective Instruction Prefetching via Fetch Prestaging

IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Papers - Volume 01
Temporal Streaming of Shared Memory

Proceedings of the 32nd annual international symposium on Computer Architecture
Data Cache Prefetching Using a Global History Buffer

HPCA '04 Proceedings of the 10th International Symposium on High Performance Computer Architecture
Dual-Core Execution: Building a Highly Scalable Single-Thread Instruction Window

Proceedings of the 14th International Conference on Parallel Architectures and Compilation Techniques
Future Execution: A Hardware Prefetching Technique for Chip Multiprocessors

Proceedings of the 14th International Conference on Parallel Architectures and Compilation Techniques
Computer Architecture, Fourth Edition: A Quantitative Approach

Computer Architecture, Fourth Edition: A Quantitative Approach
Fixed and Adaptive Sequential Prefetching in Shared Memory Multiprocessors

ICPP '93 Proceedings of the 1993 International Conference on Parallel Processing - Volume 01
Using hardware performance monitors to understand the behavior of java applications

VM'04 Proceedings of the 3rd conference on Virtual Machine Research And Technology Symposium - Volume 3
Feedback Directed Prefetching: Improving the Performance and Bandwidth-Efficiency of Hardware Prefetchers

HPCA '07 Proceedings of the 2007 IEEE 13th International Symposium on High Performance Computer Architecture
Data prefetching in a cache hierarchy with high bandwidth and capacity

ACM SIGARCH Computer Architecture News
Low-Cost Epoch-Based Correlation Prefetching for Commercial Applications

Proceedings of the 40th Annual IEEE/ACM International Symposium on Microarchitecture
Hardware counter driven on-the-fly request signatures

Proceedings of the 13th international conference on Architectural support for programming languages and operating systems
Processor hardware counter statistics as a first-class system resource

HOTOS'07 Proceedings of the 11th USENIX workshop on Hot topics in operating systems
Data access history cache and associated data prefetching mechanisms

Proceedings of the 2007 ACM/IEEE conference on Supercomputing
IBM POWER6 microarchitecture

IBM Journal of Research and Development
Spatio-temporal memory streaming

Proceedings of the 36th annual international symposium on Computer architecture
Stream chaining: exploiting multiple levels of correlation in data prefetching

Proceedings of the 36th annual international symposium on Computer architecture
Coordinated control of multiple prefetchers in multi-core systems

Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture
Inter-core cooperative TLB for chip multiprocessors

Proceedings of the fifteenth edition of ASPLOS on Architectural support for programming languages and operating systems
An Adaptive Data Prefetcher for High-Performance Processors

CCGRID '10 Proceedings of the 2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing
Improving the Effectiveness of Context-Based Prefetching with Multi-order Analysis

ICPPW '10 Proceedings of the 2010 39th International Conference on Parallel Processing Workshops

Quantified Score

Hi-index	0.00

Visualization

Abstract

The rapid advance of processor architectures such as the emerged multicore architectures and the substantially increased computing capability on chip have put more pressure on the sluggish memory systems than ever. In the meantime, many applications become more and more data intensive. Data-access delay, not the processor speed, becomes the leading performance bottleneck of high-performance computing. Data prefetching is an effective solution to accelerating applications' data access and bridging the growing gap between computing speed and data-access speed. Existing works of prefetching, however, are very conservative in general, due to the computing power consumption concern of the past. They suffer low effectiveness especially when applications' access pattern changes. In this study, we propose an Algorithm-level Feedback-controlled Adaptive (AFA) data prefetcher to address these issues. The AFA prefetcher is based on the Data-Access History Cache, a hardware structure that is specifically designed for data access acceleration. It provides an algorithm-level adaptation and is capable of dynamically adapting to appropriate prefetching algorithms at runtime. We have conducted extensive simulation testing with the SimpleScalar simulator to validate the design and to analyze the performance gain. The simulation results show that the AFA prefetcher is effective and achieves considerable IPC (Instructions Per Cycle) improvement for 21 representative SPEC-CPU benchmarks.