Correlated load-address predictors

Authors:
Michael Bekerman;Stephan Jourdan;Ronny Ronen;Gilad Kirshenboim;Lihu Rappoport;Adi Yoaz;Uri Weiser
Affiliations:
Intel Corporation, Intel Israel (74) Ltd., Haifa 31015, Israel;Intel Corporation, Intel Israel (74) Ltd., Haifa 31015, Israel;Intel Corporation, Intel Israel (74) Ltd., Haifa 31015, Israel;Intel Corporation, Intel Israel (74) Ltd., Haifa 31015, Israel;Intel Corporation, Intel Israel (74) Ltd., Haifa 31015, Israel;Intel Corporation, Intel Israel (74) Ltd., Haifa 31015, Israel;Intel Corporation, Intel Israel (74) Ltd., Haifa 31015, Israel
Venue:
ISCA '99 Proceedings of the 26th annual international symposium on Computer architecture
Year:
1999

Citing 10
Cited 34

An effective on-chip preloading scheme to reduce data access penalty

Proceedings of the 1991 ACM/IEEE conference on Supercomputing
Zero-cycle loads: microarchitecture support for reducing load latency

Proceedings of the 28th annual international symposium on Microarchitecture
Multiple-block ahead branch predictors

Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
Value locality and load value prediction

Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
The performance potential of data dependence speculation & collapsing

Proceedings of the 29th annual ACM/IEEE international symposium on Microarchitecture
Speculative execution via address prediction and data prefetching

ICS '97 Proceedings of the 11th international conference on Supercomputing
The predictability of data values

MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Highly accurate data value prediction using hybrid predictors

MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Effective Hardware-Based Data Prefetching for High-Performance Processors

IEEE Transactions on Computers
Split Last-Address Predictor

PACT '98 Proceedings of the 1998 International Conference on Parallel Architectures and Compilation Techniques

Speculation techniques for improving load related instruction scheduling

ISCA '99 Proceedings of the 26th annual international symposium on Computer architecture
Read-after-read memory dependence prediction

Proceedings of the 32nd annual ACM/IEEE international symposium on Microarchitecture
Early load address resolution via register tracking

Proceedings of the 27th annual international symposium on Computer architecture
Algorithmic foundations for a parallel vector access memory system

Proceedings of the twelfth annual ACM symposium on Parallel algorithms and architectures
Predictor-directed stream buffers

Proceedings of the 33rd annual ACM/IEEE international symposium on Microarchitecture
Inherently Lower-Power High-Performance Superscalar Architectures

IEEE Transactions on Computers
Reducing Memory Latency via Read-after-Read Memory Dependence Prediction

IEEE Transactions on Computers
The predictability of load address

ACM SIGARCH Computer Architecture News
Direct load: dependence-linked dataflow resolution of load address and cache coordinate

Proceedings of the 34th annual ACM/IEEE international symposium on Microarchitecture
Hybrid Load-Value Predictors

IEEE Transactions on Computers
A Decoupled Predictor-Directed Stream Prefetching Architecture

IEEE Transactions on Computers
Putting Data Value Predictors to Work in Fine-Grain Parallel Processors

HiPC '01 Proceedings of the 8th International Conference on High Performance Computing
Enhancing memory level parallelism via recovery-free value prediction

ICS '03 Proceedings of the 17th annual international conference on Supercomputing
Hybridizing and Coalescing Load Value Predictors

ICCD '00 Proceedings of the 2000 IEEE International Conference on Computer Design: VLSI in Computers & Processors
Detecting global stride locality in value streams

Proceedings of the 30th annual international symposium on Computer architecture
Address-free memory access based on program syntax correlation of loads and stores

IEEE Transactions on Very Large Scale Integration (VLSI) Systems - Special section on the 2001 international conference on computer design (ICCD)
Cluster prefetch: tolerating on-chip wire delays in clustered microarchitectures

Proceedings of the 18th annual international conference on Supercomputing
Memory-side prefetching for linked data structures for processor-in-memory systems

Journal of Parallel and Distributed Computing
Enhancing Memory-Level Parallelism via Recovery-Free Value Prediction

IEEE Transactions on Computers
Reducing latencies of pipelined cache accesses through set prediction

Proceedings of the 19th annual international conference on Supercomputing
Address-Value Delta (AVD) Prediction: Increasing the Effectiveness of Runahead Execution by Exploiting Regular Memory Allocation Patterns

Proceedings of the 38th annual IEEE/ACM International Symposium on Microarchitecture
Load squared: adding logic close to memory to reduce the latency of indirect loads with high miss ratios

MEDEA '04 Proceedings of the 2004 workshop on MEmory performance: DEaling with Applications , systems and architecture
Dynamic feature selection for hardware prediction

Journal of Systems Architecture: the EUROMICRO Journal
A comparison of two policies for issuing instructions speculatively

Journal of Systems Architecture: the EUROMICRO Journal
Reducing non-deterministic loads in low-power caches via early cache set resolution

Microprocessors & Microsystems
Core fusion: accommodating software diversity in chip multiprocessors

Proceedings of the 34th annual international symposium on Computer architecture
Increasing cache capacity through word filtering

Proceedings of the 21st annual international conference on Supercomputing
Load squared: Adding logic close to memory to reduce the latency of indirect loads in embedded and general systems

Journal of Embedded Computing - Embeded Processors and Systems: Architectural Issues and Solutions for Emerging Applications
Zero loads: canceling load requests by tracking zero values

Proceedings of the 9th workshop on MEmory performance: DEaling with Applications, systems and architecture
On reducing load/store latencies of cache accesses

Journal of Systems Architecture: the EUROMICRO Journal
Data value prefetching method based on Markov model

ICCOMP'06 Proceedings of the 10th WSEAS international conference on Computers
Cache optimizations for iterative numerical codes aware of hardware prefetching

PARA'04 Proceedings of the 7th international conference on Applied Parallel Computing: state of the Art in Scientific Computing
Global register alias table: Boosting sequential program on multi-core

Future Generation Computer Systems
Targeted data prefetching

ACSAC'05 Proceedings of the 10th Asia-Pacific conference on Advances in Computer Systems Architecture

Quantified Score

Hi-index	0.02

Visualization

Abstract

As microprocessors become faster, the relative performance cost of memory accesses increases. Bigger and faster caches significantly reduce the absolute load-to-use time delay. However, increase in processor operational frequencies impairs the relative load-to-use latency, measured in processor cycles (e.g. from two cycles on the Pentium® processor to three cycles or more in current designs). Load-address prediction techniques were introduced to partially cut the load-to-use latency. This paper focuses on advanced address-prediction schemes to further shorten program execution time.Existing address prediction schemes are capable of predicting simple address patterns, consisting mainly of constant addresses or stride-based addresses. This paper explores the characteristics of the remaining loads and suggests new enhanced techniques to improve prediction effectiveness:• Context-based prediction to tackle part of the remaining, difficult-to-predict, load instructions.• New prediction algorithms to take advantage of global correlation among different static loads.• New confidence mechanisms to increase the correct prediction rate and to eliminate costly mispredictions.• Mechanisms to prevent long or random address sequences from polluting the predictor data structures while providing some hysteresis behavior to the predictions.Such an enhanced address predictor accurately predicts 67% of all loads, while keeping the misprediction rate close to 1%. We further prove that the proposed predictor works reasonably well in a deep pipelined architecture where the predict-to-update delay may significantly impair both prediction rate and accuracy.