Early load address resolution via register tracking

Authors:
Michael Bekerman;Adi Yoaz;Freddy Gabbay;Stephan Jourdan;Maxim Kalaev;Ronny Ronen
Affiliations:
HAL Computer Systems and Intel Corporation;Intel Corporation;Mellanox Technologies Inc. and Intel Corporation;Intel Corporation;Intel Corporation;Intel Corporation
Venue:
Proceedings of the 27th annual international symposium on Computer architecture
Year:
2000

Citing 17
Cited 18

An effective on-chip preloading scheme to reduce data access penalty

Proceedings of the 1991 ACM/IEEE conference on Supercomputing
Streamlining data cache access with fast address calculation

ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Zero-cycle loads: microarchitecture support for reducing load latency

Proceedings of the 28th annual international symposium on Microarchitecture
Value locality and load value prediction

Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
Trace cache: a low latency approach to high bandwidth instruction fetching

Proceedings of the 29th annual ACM/IEEE international symposium on Microarchitecture
Speculative execution via address prediction and data prefetching

ICS '97 Proceedings of the 11th international conference on Supercomputing
Dynamic speculation and synchronization of data dependences

Proceedings of the 24th annual international symposium on Computer architecture
Streamlining inter-operation memory communication via data dependence prediction

MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
The effect of instruction fetch bandwidth on value prediction

Proceedings of the 25th annual international symposium on Computer architecture
Memory dependence prediction using store sets

Proceedings of the 25th annual international symposium on Computer architecture
A novel renaming scheme to exploit value temporal locality through physical register reuse and unification

MICRO 31 Proceedings of the 31st annual ACM/IEEE international symposium on Microarchitecture
Speculation techniques for improving load related instruction scheduling

ISCA '99 Proceedings of the 26th annual international symposium on Computer architecture
Correlated load-address predictors

ISCA '99 Proceedings of the 26th annual international symposium on Computer architecture
Decoupling local variable accesses in a wide-issue superscalar processor

ISCA '99 Proceedings of the 26th annual international symposium on Computer architecture
Effective Hardware-Based Data Prefetching for High-Performance Processors

IEEE Transactions on Computers
Register allocation for free: The C machine stack cache

ASPLOS I Proceedings of the first international symposium on Architectural support for programming languages and operating systems
A load-instruction unit for pipelined processors

IBM Journal of Research and Development

L1 data cache decomposition for energy efficiency

ISLPED '01 Proceedings of the 2001 international symposium on Low power electronics and design
The predictability of load address

ACM SIGARCH Computer Architecture News
Direct load: dependence-linked dataflow resolution of load address and cache coordinate

Proceedings of the 34th annual ACM/IEEE international symposium on Microarchitecture
Dynamic memory instruction bypassing

ICS '03 Proceedings of the 17th annual international conference on Supercomputing
Address-free memory access based on program syntax correlation of loads and stores

IEEE Transactions on Very Large Scale Integration (VLSI) Systems - Special section on the 2001 international conference on computer design (ICCD)
Using Dynamic Binary Translation to Fuse Dependent Instructions

Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
Wrong Path Events: Exploiting Unusual and Illegal Program Behavior for Early Misprediction Detection and Recovery

Proceedings of the 37th annual IEEE/ACM International Symposium on Microarchitecture
Load elimination for low-power embedded processors

GLSVLSI '05 Proceedings of the 15th ACM Great Lakes symposium on VLSI
Continuous Optimization

Proceedings of the 32nd annual international symposium on Computer Architecture
RENO: A Rename-Based Instruction Optimizer

Proceedings of the 32nd annual international symposium on Computer Architecture
Reducing latencies of pipelined cache accesses through set prediction

Proceedings of the 19th annual international conference on Supercomputing
Dynamic memory instruction bypassing

International Journal of Parallel Programming - Special issue I: The 17th annual international conference on supercomputing (ICS'03)
Reducing non-deterministic loads in low-power caches via early cache set resolution

Microprocessors & Microsystems
Working with process variation aware caches

Proceedings of the conference on Design, automation and test in Europe
Block remap with turnoff: a variation-tolerant cache design technique

Proceedings of the 2008 Asia and South Pacific Design Automation Conference
A Tale of Two Processors: Revisiting the RISC-CISC Debate

Proceedings of the 2009 SPEC Benchmark Workshop on Computer Performance Evaluation and Benchmarking
On reducing load/store latencies of cache accesses

Journal of Systems Architecture: the EUROMICRO Journal
Data value prefetching method based on Markov model

ICCOMP'06 Proceedings of the 10th WSEAS international conference on Computers

Quantified Score

Hi-index	0.00

Visualization

Abstract

Higher microprocessor frequencies accentuate the performance cost of memory accesses. This is especially noticeable in the Intel's IA32 architecture where lack of registers results in increased number of memory accesses. This paper presents novel, non-speculative technique that partially hides the increasing load-to-use latency, by allowing the early issue of load instructions. Early load address resolution relies on register tracking to safely compute the addresses of memory references in the front-end part of the processor pipeline. Register tracking enables decode-time computation of register values by tracking simple operations of the form reg±immediate. Register tracking may be performed in any pipeline stage following instruction decode and prior to execution.Several tracking schemes are proposed in this paper:Stack pointer tracking allows safe early resolution of stack references by keeping track of the value of the ESP register (the stack pointer). About 25% of all loads are stack loads and 95% of these loads may be resolved in the front-end.Absolute address tracking allows the early resolution of constant-address loads.Displacement-based tracking tackles all loads with addresses of the form reg±immediate by tracking the values of all general-purpose registers. This class corresponds to 82% of all loads, and about 65% of these loads can be safely resolved in the front-end pipeline.The paper describes the tracking schemes, analyzes their performance potential in a deeply pipelined processor and discusses the integration of tracking with memory disambiguation.